Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Tianjun, F. u., Abbasi, A., & Chen, H. (2010). A focused crawler for dark web forums. Journal of the American Society for Information Science and Technology, 61(6), 1213-1231.

Abstract:

The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings.The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic-and incremental-update approaches. Using the system, we were able to collect over 100 DarkWeb forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities. © 2010 ASIS&T.

Huang, Z., Chung, W., & Chen, H. (2004). A graph model for e-commerce recommender systems. Journal of the American Society for Information Science and Technology, 55(3), 259-274.

Abstract:

Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customer's preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.

Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer Naïve Bayes model for approximate identity matching. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 479-484.

Abstract:

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning. © Springer-Verlag Berlin Heidelberg 2006.

Zimbra, D., & Chen, H. (2010). Comparing the virtual linkage intensity and real world proximity of social movements. ISI 2010 - 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, 144-146.

Abstract:

The relationships between phenomena observed in the real world and their representations in virtual contexts have generated interest among researchers. In particular, the manifestations of social movements in virtual environments have been examined, with many studies dedicated to the analysis of the virtual linkages between groups. In this research, a form of link analysis was performed to examine the relationship between virtual linkage intensity and real world physical proximity among the social movement groups identified in the Southern Poverty Law Center Spring 2009 Intelligence Report. Findings indicate the existence of significant relationships between virtual linkage intensity and physical proximity, distinctive to various ideological categorizations. The results provide valuable insights into the behaviors of social movements in virtual environments. © 2010 IEEE.

Zhou, Y., Qin, J., Lai, G., Reid, E., & Chen, H. (2006). Exploring the dark side of the Web: Collection and analysis of U.S. extremist online forums. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 621-626.

Abstract:

Contents in extremist online forums are invaluable data sources for extremism reseach. In this study, we propose a systematic Web mining approach to collecting and monitoring extremist forums. Our proposed approach identifies extremist forums from various resources, addresses practical issues faced by researchers and experts in the extremist forum collection process. Such collection provides a foundation for quantitative forum analysis. Using the proposed approach, we created a collection of 110 U.S. domestic extremist forums containing more than 640,000 documents. The collection building results demonstrate the effectiveness and feasibility of our approach. Furthermore, the extremist forum collection we created could serve as an invaluable data source to enable a better understanding of the extremism movements. © Springer-Verlag Berlin Heidelberg 2006.