Industrial Engineering and Management Sciences

Public Deposited

Unstructured data like text is plentiful and possibly contains valuable insights leading to a better decision-making process. Manually obtaining these insights can be costly and time-consuming. Text mining, also known as Text analytics, is developed to derive meaningful information from textual data. It is widely applied in various domains such as business-oriented problems, legal space, social media, and biomedical applications. This dissertation aims to apply text mining techniques including linguistic information retrieval, statistical and machine learning to solve three different problems. Patent litigations are generally unpredictable, disruptive, and expensive. An ability to predict the patent likelihood and estimate time to litigation in advanced is profitable in many aspects. We propose predictive models relying on textual and non-textual features to forecast patent litigations and time to litigation in the second chapter. In the next chapter, we consider an application of text mining techniques in the health-care domain. In Community-based Question Answering sites, several health-related questions are posted but remain unanswered. We consequently develop an automate system to answer questions based on past question-answer pairs. We address a semantic aspect of textual statements in the last chapter. Contents from various sources especially from web sites are not necessarily reliable which potentially cause negative impacts to readers. Hence, an algorithm to validate the truthfulness of statements and provide supporting evidence for a false triplet is proposed.

Last modified
  • 02/22/2019
Date created
Resource type
Rights statement