Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements
DOI:
https://doi.org/10.3991/ijep.v10i3.12419Keywords:
Information Extraction, Information Retrieval, Natural Language Processing, Text Mining, Web CrawlerAbstract
Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries.
Downloads
Published
How to Cite
Issue
Section
License
The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.
This journal has been awarded the SPARC Europe Seal for Open Access Journals (What's this?)