EWA Systems offers text mining, text search, natural language processing, and text conceptual analysis.
Text Mining V4.1
EWA Systems' text mining package approaches text mining with the classical word-frequncy approach, which reduces each document to a set of words and the number of times they are found in the document. Typically, this word frequency list is sufficient to capture the main gist of the document.
The text mining package includes everything that it takes to reduce text documents, such as in plain text, HTML, XML, and PDF files into word frequency lists. These word frequency lists are then combined to form a data table. This data table extracts the desired features, words, from the word frequency lists that are being compared. The resulting table is square and contains the word frequency probability for each of the featured words.
Additional features include...
The list is further simplified by the fact that the most common words like "a", "the", and "and" can be discarded, and the least common words relate to side references.
This package supports the following operations:
- Classification: Sorting of documents into categories. For example, sorting email messages by topic into specific help categories for auto response.
- Clustering: Discovery of groups of texts that are fundamentally different. For example, grouping of similar questionnaire responses and reporting on the differences between the groups.
- Association: Discovery of the common elements of similar documents. For example, documents that discuss topic "A" are 90% likely to discuss topic "B" as well.
- Sorting: Sorting documents from most alike to least alike, or its reverse, compared to a specified document or a document signature. For example, dynamic ranking of resumes in order to find top candidates faster.
across a wide range of structured and unstructured text formats, such as plain text, HTML, XML, Word and PDF documents. These text mining algorithms include features to correct for typos and phonetic spellings, and utilizes word rooting and a thesaurus to recognize words with similar meanings.
The text mining package uses EWA Systems' data mining package for its calculations. The wide range of algorithms available in the data mining packages ensures that the text mining problem can be solve with the algorithm that achieves the right accuracy/speed balance for you. The most common algorithms used, all of which are available in the data mining package, include k-Nearest Neighbors, Neural Networks, Decision Trees, and Support Vector Machines.
This Text Mining Architecture document details the core classes of this package and how they are used to mine textual information.
Text Search Features
EWA Systems' text search package enables the rapid search of text documents for specific words, groups of words, or text associated with a specific object or action, across a wide range of structured and unstructured text formats, such as plain text, HTML, XML, Word and PDF documents. These text search algorithms include features to correct for typos and phonetic spellings, and utilizes word rooting and a thesaurus to recognize words with similar meanings. The algorithms support multiple languages across documents and even in the same sentence.
In its most basic usage, one could search The Complete Works of Shakespeare, for instance, for sentences which include "Juliet", or more specifically those that also refer to a "dagger". Searches can also specify specific actions, such as "to stab" which would return swordplay and backstabbing instances, weighted according to the differences between the nuances of the used verb compared, such as "to impale", to "to stab". Once can also search for linkages where two characters interact, such as "Romeo" and "Juliet", which would return all lines where Romeo is the subject of the sentence and "Juliet" is the object, or if desired the reverse. This search could have been just as easily applied to a set of technical papers, legal briefs, or intelligence documents.
Natural Language Processing Features
EWA Systems' natural language processing package further improves the accuracy of the Text Mining and Text Search packages by parsing the complete sentence. As a result, the nuances resulting from the use of adjectives and adverbs, , and the context of other surrounding sentences can be taken into account, and pronouns can be properly referenced. Many of these details are beyond text mining and search packages, yet greatly modify sentence meaning.
This package extends the capabilities of the text mining and search packages by filling in the pronouns, implied subjects, and improving the distinctions drawn from the wording when accomplishing sorting or association tasks. For instance, test searches for lines depicting a specified person will now also return lines where that person was referenced by a pronoun.
Conceptual Text Analysis Features
EWA Systems' conceptual text analysis is an expert system that allows questions to be asked of a set of texts. In response the expert system will return the answer and its supporting information and rationale.