Assignment 3 – Text Analysis
60-141 – Introduction to Programming II
The availability of computers with string-manipulation capabilities has resulted in some rather interesting approaches to analyzing the writings of great authors. Much attention has been focused on whether William Shakespeare ever lived. Some scholars find substantial evidence that Christopher Marlowe actually penned the masterpieces attributed to Shakespeare. Researchers have used computers to find similarities in the writings of these two authors. Regardless of this particular controversy, this assignment examines three methods for analyzing texts with a computer.
Your task is to:
Write a complete, well documented C program that reads several lines of text and prints three tables indicating:
1) the number of occurrences of each letter of the alphabet in the complete text
2) the number of one-letter words, two-letter words, three-letter words, and so on, appearing in the complete text
3) the number of occurrences of each different word in the complete text
Note that the term “complete text” used above refers to all characters in all lines of inputted text.
Requirements and Hints:
1. Given several lines of text, your program should implement at least the following functions (in other words, additional functions may be useful, depending on the approach used):
a) void letterAnalysis( ), that gets several lines of text and the number of lines of text as input parameters and prints a table indicating the number of occurrences of each letter of the alphabet in the complete text.
b) int wordLengthAnalysis( ), that gets several lines of text, the number of lines of text and a length as input parameters and returns the number of occurrences of words with that length appearing in the text. The main() function should call this function with different word lengths and then prints a table indicating the number of one-letter words, two-letter words, three-letter words, and so on, in the text. The maximum word length may be assumed to be 20 letters.
c) void wordAnalysis( ), that gets several lines of text and the number of lines of text as input parameters and prints a table indicating the number of occurrences of each different word in the text. The program should include the words in the table in the same order in which they appear in the text.
Input:
Input is a number (say, int N) and several (i.e. N) lines of text from a file (using input redirection) or from the user (using standard keyboard input). For this, first the number (N) of text lines should be read, and then each line of text should be read individually. The maximum number of lines is 10 but
each text line might have different lengths (however, the maximum number of characters in any
individual line is 80).
Hint: You can use a two dimensional array or an array of pointers to save the text lines.
For example
4
To be, or not to be? That is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
Output:
The table below is intended to illustrate the output for this assignment, but it is presented for visual
convenience only. Your program will actually produce the output reports so that various outputs
are generated one after another (First output, followed by Second output, followed by Third output).
In particular, note the formatting issues to be dealt with. For the First output, the letter count must
be right justified in a column of prescribed width. For the Second output, the word “word” is used
only if the number of words is 1, otherwise “words” is used. Finally, for the Third output, “times”
is used only when the number of occurrences is greater than 1.
First output Second output Third output
Total letter counts:
Note: You do not have to separate punctuation marks (such as comma, dot, or question mark)
from the words. For example, both “be,” and “be?” counted as a three-letter words above.
Requirements:
– Write and document a complete C program that is capable of satisfying the requirements of this
assignment problem.
Answer:
/* Title: Assignment #3: Text Analysis Objective: This program will read several lines of text and prints three tables indicating: 1) the number of occurrences of each letter of the alphabet in the complete text 2) the number of one-letter words, two-letter words, three-letter words, and so on, appearing in the complete text 3) the number of occurrences of each different word in the complete text. */ //C-Preprocessor Directives #include <stdio.h> #include <stdlib.h> #include <string.h> #define WORDLENGTH 20 #define MAX_NUMBER_LINES 10 #define LINELENGTH 80 //Function Prototypes void letterAnalysis(char [][LINELENGTH], int rows); int wordLengthAnalysis(char [][LINELENGTH], int rows, int wordLength); void wordAnalysis(char [][LINELENGTH], int rows); int main() { int rows; char fileTxt[MAX_NUMBER_LINES][LINELENGTH],lineChar[1]; fgets(lineChar, 10, stdin); //To get the integer value of char rows = lineChar[0] - '0'; int m,n; //To read from the file to an array for (m = 0; m < rows; m++) { fgets (fileTxt[m], LINELENGTH, stdin); } printf("\n_________________________\n"); printf("\n First output \n"); printf("\n_________________________\n"); printf("\n%s\n", "Total letter counts: "); printf("\n~~~~~~~~~~~~~~~~~~~~~~~~~\n"); letterAnalysis(fileTxt, rows); printf("\n_________________________\n"); printf("\n Second output \n"); printf("\n_________________________\n"); printf("%s\n", "\nWord length"); printf("\n~~~~~~~~~~~~~~~~~~~~~~~~~\n"); for (n = 0; n <= WORDLENGTH; n++){ if((wordLengthAnalysis(fileTxt,rows,n))==0){ printf(""); } else if((wordLengthAnalysis(fileTxt,rows,n))==1){ printf("%d word of length \t%d\n",wordLengthAnalysis(fileTxt,rows,n),n ); }else{ printf("%d words of length \t%d\n",wordLengthAnalysis(fileTxt,rows,n),n ); } } printf("\n_________________________\n"); printf("\n Third output "); printf("\n_________________________\n"); printf("%s\n", "\nWord count"); printf("\n~~~~~~~~~~~~~~~~~~~~~~~~~\n"); wordAnalysis(fileTxt, rows); return 0; } /* Objective: This function is used to indicate the number of occurrences of each letter of the alphabet. Input: Several lines of text and number of lines of text. Output: Table showing number of occurrences of each letter of the alphabet. */ void letterAnalysis(char fileTxt[][LINELENGTH], int rows) { int alphaChar[26] ={0},k, l; for (k=0; k < rows; k++){ for (l = 0; l < strlen(fileTxt[k]); l++) { fileTxt[k][l] = tolower(fileTxt[k][l]); if (isalpha(fileTxt[k][l])) //97 in decimal is equivalent to the character 'a' alphaChar[fileTxt[k][l] % 97]++; } } for (k = 0; k < 26; k++){ printf("%c: %d\n",97 + k, alphaChar[k]); } } /* Objective: This function counts occurrences of words with same length which was passed to this function. Input: Several lines of text, number of lines of text and a length. Output: Returns the number of occurrences of words with the same length passed as input. */ int wordLengthAnalysis(char otext[][LINELENGTH], int rows, int word_Length) { int i, wordCount = 0; char fileTxt[rows][LINELENGTH]; for (i = 0; i < rows; i++) { strcpy(fileTxt[i], otext[i]); char *token = strtok(fileTxt[i], " "); while (token != NULL) { if (strlen(token) == word_Length) wordCount++; //strtok function is used to skip to the next token which is separated by the " " token = strtok(NULL, " "); } } return wordCount; } /* Objective: This function counts occurrences of same words . Input: Several lines of text and number of lines of text. Output: Returns the number of occurrences of each different word in the text. */ void wordAnalysis(char otext[][LINELENGTH], int rows) { int i,j,h=0, wordCount = 0; char fileTxt[rows][LINELENGTH],btext[rows][LINELENGTH]; for (i = 0; i < rows; i++) { strcpy(fileTxt[i], otext[i]); char *token = strtok(fileTxt[i], " \n"); while (token != NULL) { wordCount++; token = strtok(NULL, " \n"); } } char words[wordCount][WORDLENGTH]; for (j=0; j < rows; j++) { //temp storage using string copy function, which returns a pointer to the destination strcpy(btext[j], otext[j]); char *token = strtok(btext[j], " "); while (token != NULL) { strcpy(words[h], token); token = strtok(NULL, " "); h++; } } for (j=0; j < wordCount-1 ; j++) { //strcmp function is used for String Compare return positive,0 or negative if (strcmp(words[j], "-1") == 0) continue; int same=1; for (h = j + 1; h < wordCount; h++) { if (strcmp(words[j], words[h]) == 0 && strcmp(words[h], "-1") != 0) { same++; strcpy(words[h], "-1"); } } if (words[j][0] == '\n') words[j][0] = ' '; if(same==1){ printf("\"%-s\"\t appeared %d time\n",words[j], same); }else{ printf("\"%-s\"\t appeared %d times\n",words[j], same); } } }
Leave a reply