Question classification

There is an increasing trend for web users to ask question and get answers from the web portals. Web portals which provide the functionality for asking and replying to the questions are commonly known as Community Question Answering (CQA) services. These CQA services also allow users to search thro...

全面介紹

Saved in:
書目詳細資料
主要作者: Teh, Li Li.
其他作者: School of Computer Engineering
格式: Final Year Project
語言:English
出版: 2012
主題:
在線閱讀:http://hdl.handle.net/10356/48567
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
id sg-ntu-dr.10356-48567
record_format dspace
spelling sg-ntu-dr.10356-485672023-03-03T20:48:54Z Question classification Teh, Li Li. School of Computer Engineering Cong Gao DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications There is an increasing trend for web users to ask question and get answers from the web portals. Web portals which provide the functionality for asking and replying to the questions are commonly known as Community Question Answering (CQA) services. These CQA services also allow users to search through question-answer pairs previously asked or browse through the categories. However, the categories might not be precise enough to be searched effectively. Recent studies have shown that it is more efficient if more specific subcategories are used. The project focuses on the CQA for the topic of cancer, whereby the topic is subcategorised into six different health stages. The six stages are as follows: 1) Stage 1: when healthy 2) Stage 2: when think might be ill 3) Stage 3: before getting a medical test or checkup 4) Stage 4: when diagnosed or self-diagnosed as ill 5) Stage 5: before a treatment, surgery, or taking certain medications 6) Stage 6: when receiving or taking treatments, medications, or exercise routines The aim is to explore the effectiveness of CQA with more specific subcategories using the text classification process to organize the questions asked into six different health stages. A web crawler is developed to extract thousands of cancer-related questions from a CQA portal and stored in XML format. In the experiment, there are two classification techniques used, namely Naive Bayes and Decision Stump. It is proven that Decision Stump performs better than Naive Bayes. Decision Stump has an overall accuracy of 57.384% compared to Naive Bayes which has 51.8987%. Bachelor of Engineering (Computer Science) 2012-04-26T05:57:37Z 2012-04-26T05:57:37Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/48567 en Nanyang Technological University 32 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Teh, Li Li.
Question classification
description There is an increasing trend for web users to ask question and get answers from the web portals. Web portals which provide the functionality for asking and replying to the questions are commonly known as Community Question Answering (CQA) services. These CQA services also allow users to search through question-answer pairs previously asked or browse through the categories. However, the categories might not be precise enough to be searched effectively. Recent studies have shown that it is more efficient if more specific subcategories are used. The project focuses on the CQA for the topic of cancer, whereby the topic is subcategorised into six different health stages. The six stages are as follows: 1) Stage 1: when healthy 2) Stage 2: when think might be ill 3) Stage 3: before getting a medical test or checkup 4) Stage 4: when diagnosed or self-diagnosed as ill 5) Stage 5: before a treatment, surgery, or taking certain medications 6) Stage 6: when receiving or taking treatments, medications, or exercise routines The aim is to explore the effectiveness of CQA with more specific subcategories using the text classification process to organize the questions asked into six different health stages. A web crawler is developed to extract thousands of cancer-related questions from a CQA portal and stored in XML format. In the experiment, there are two classification techniques used, namely Naive Bayes and Decision Stump. It is proven that Decision Stump performs better than Naive Bayes. Decision Stump has an overall accuracy of 57.384% compared to Naive Bayes which has 51.8987%.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Teh, Li Li.
format Final Year Project
author Teh, Li Li.
author_sort Teh, Li Li.
title Question classification
title_short Question classification
title_full Question classification
title_fullStr Question classification
title_full_unstemmed Question classification
title_sort question classification
publishDate 2012
url http://hdl.handle.net/10356/48567
_version_ 1759854047533203456