Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Woo, J., & Chen, H. (2016). Epidemic Model for Information Diffusion in Web Forums: Experiments in Marketing Exchange and Political Dialog. SpringerPlus.
Zhou, Y., Qin, J., Lai, G., & Chen, H. (2007). Collection of U.S. extremist online forums: A web mining approach. Proceedings of the Annual Hawaii International Conference on System Sciences.

Abstract:

Extremists' exploitation of computer-mediated communications such as online forums has recently gained much attention from academia and the government. However, due to the covert nature of these forums and the dynamic nature of the Internet, there have been no systematic methodologies developed for collection and analysis of online information created by extremists. In this study, we propose a systematic Web mining approach to collecting and monitoring extremist forums. Our proposed approach identifies extremist forums from various resources, addresses practical issues faced by researchers and experts in the extremist forum collection process. Such collection provides a foundation for quantitative forum analysis. Using the proposed approach, we created a collection of 110 U.S. domestic extremist forums containing more than 640,000 documents. We also report our findings on the multimedia usage pattern, participant distribution, and posting activity distribution. The collection building results demonstrate the effectiveness and feasibility of our approach. Furthermore, the extremist forum collection we created could serve as an invaluable data source to enable a better understanding of the extremists' movements. © 2007 IEEE.

Benjamin, V., Chen, H., & Zimbra, D. (2014). Bridging the Virtual and Real: The Relationship Between Web Content, Linkage, and Geographical Proximity of Social Movements. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(11), 2210-2222.

As the Internet becomes ubiquitous, it has advanced to more closely represent aspects of the real world. Due to this trend, researchers in various disciplines have become interested in studying relationships between real-world phenomena and their virtual representations. One such area of emerging research seeks to study relationships between real-world and virtual activism of social movement organization (SMOs). In particular, SMOs holding extreme social perspectives are often studied due to their tendency to have robust virtual presences to circumvent real-world social barriers preventing information dissemination. However, many previous studies have been limited in scope because they utilize manual data-collection and analysis methods. They also often have failed to consider the real-world aspects of groups that partake in virtual activism. We utilize automated data-collection and analysis methods to identify significant relationships between aspects of SMO virtual communities and their respective real-world locations and ideological perspectives. Our results also demonstrate that the interconnectedness of SMO virtual communities is affected specifically by aspects of the real world. These observations provide insight into the behaviors of SMOs within virtual environments, suggesting that the virtual communities of SMOs are strongly affected by aspects of the real world.

Wang, G. A., Kaza, S., Joshi, S., Chang, K., Atabakhsh, H., & Chen, H. (2007). The Arizona IDMatcher: A probabilistic identity matching system. ISI 2007: 2007 IEEE Intelligence and Security Informatics, 229-235.

Abstract:

Various law enforcement and intelligence tasks require managing identity information in an effective and efficient way. However, the quality issues of identity information make this task non-trivial. Various heuristic based systems have been developed to tackle the identity matching problem. However, deploying such systems may require special expertise in system configuration and customization for optimal system performance. In this paper, we propose an alternative system called the Arizona IDMatcher. The system relies on a machine learning algorithm to automatically generate a decision model for identity matching. Such a system requires minimal human configuration effort. Experiments show that the Arizona IDMatcher is very efficient in detecting matching identity records. Compared to IBM Identity Resolution (a commercial, heuristic-based system), the Arizona IDMatcher achieves better recall and overall F-measures in identifying matching identities in two large-scale real-world datasets. © 2007 IEEE.

Chen, H., Ng, T. D., Martinez, J., & Schatz, B. R. (1997). A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. Journal of the American Society for Information Science, 48(1), 17-31.

Abstract:

This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive studies related to the vocabulary problem and vocabulary-based search aids (thesauri) and then discuss techniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the molecular biology domain in which we created a C. elegans worm thesaurus of 7,657 worm-specific terms and a Drosophila fly thesaurus of 15,626 terms. About 30% of these terms overlapped, which created vocabulary paths from one subject domain to the other. Based on a cognitive study of term association involving four biologists, we found that a large percentage (59.6-85.6%) of the terms suggested by the subjects were identified in the conjoined fly-worm thesaurus. However, we found only a small percentage (8.4-18.1%) of the associations suggested by the subjects in the thesaurus. In a follow-up document retrieval study involving eight fly biologists, an actual worm database (Worm Community System), and the conjoined flyworm thesaurus, subjects were able to find more relevant documents (an increase from about 9 documents to 20) and to improve the document recall level (from 32.41 to 65.28%) when using the thesaurus, although the precision level did not improve significantly. Implications of adopting the concept space approach for addressing the vocabulary problem in Internet and digital libraries applications are also discussed.