Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Lu, H., Tsai, F., Chen, H., Hung, M., & Li, S. (2012). Credit rating change modeling using news and financial ratios. ACM Transactions on Management Information Systems, 3(3).

Abstract:

Credit ratings convey credit risk information to participants in financial markets, including investors, issuers, intermediaries, and regulators. Accurate credit rating information plays a crucial role in supporting sound financial decision-making processes. Most previous studies on credit rating modeling are based on accounting and market information. Text data are largely ignored despite the potential benefit of conveying timely information regarding a firm's outlook. To leverage the additional information in news full-text for credit rating prediction, we designed and implemented a news full-text analysis system that provides firm-level coverage, topic, and sentiment variables. The novel topic-specific sentiment variables contain a large fraction of missing values because of uneven news coverage. The missing value problem creates a new challenge for credit rating prediction approaches. We address this issue by developing a missingtolerant multinomial probit (MT-MNP) model, which imputes missing values based on the Bayesian theoretical framework. Our experiments using seven and a half years of real-world credit ratings and news full-text data show that (1) the overall news coverage can explain future credit rating changes while the aggregated news sentiment cannot; (2) topic-specific news coverage and sentiment have statistically significant impact on future credit rating changes; (3) topic-specific negative sentiment has a more salient impact on future credit rating changes compared to topic-specific positive sentiment; (4) MT-MNP performs better in predicting future credit rating changes compared to support vector machines (SVM). The performance gap as measured by macroaveraging F-measure is small but consistent. © 2012 ACM.

Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., & Chen, H. (2005). Collecting and analyzing the presence of terrorists on the Web: A case study of Jihad Websites. Lecture Notes in Computer Science, 3495, 402-411.

Abstract:

The Internet which has enabled global businesses to flourish has become the very same channel for mushrooming 'terrorist news networks.' Terrorist organizations and their sympathizers have found a cost-effective resource to advance their courses by posting high-impact Websites with short shelf-lives. Because of their evanescent nature, terrorism research communities require unrestrained access to digitally archived Websites to mine their contents and pursue various types of analyses. However, organizations that specialize in capturing, archiving, and analyzing Jihad terrorist Websites employ different, manual-based analyses techniques that are inefficient and not scalable. This study proposes the development of automated or semi-automated procedures and systematic methodologies for capturing Jihad terrorist Website data and its subsequent analyses. By analyzing the content of hyperlinked terrorist Websites and constructing visual social network maps, our study is able to generate an integrated approach to the study of Jihad terrorism, their network structure, component clusters, and cluster affinity. © Springer-Verlag Berlin Heidelberg 2005.

Atabakhsh, H., Larson, C., Petersen, T., Violette, C., & Chen, H. (2004). Information sharing and collaboration policies within government agencies. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3073, 467-475.

Abstract:

This paper describes the necessity for government agencies to share data as well as obstacles to overcome in order to achieve information sharing. We study two domains: law enforcement and disease informatics. Some of the ways in which we were able to overcome the obstacles, such as data security and privacy issues, are explained. We conclude by highlighting the lessons learned while working towards our goals. © Springer-Verlag Berlin Heidelberg 2004.

Qin, J., Xu, J. J., Daning, H. u., Sageman, M., & Chen, H. (2005). Analyzing terrorist networks: A case study of the global salafi jihad network. Lecture Notes in Computer Science, 3495, 287-304.

Abstract:

It is very important for us to understand the functions and structures of terrorist networks to win the battle against terror. However, previous studies of terrorist network structure have generated little actionable results. This is mainly due to the difficulty in collecting and accessing reliable data and the lack of advanced network analysis methodologies in the field. To address these problems, we employed several advance network analysis techniques ranging from social network analysis to Web structural mining on a Global Salafi Jihad network dataset collected through a large scale empirical study. Our study demonstrated the effectiveness and usefulness of advanced network techniques in terrorist network analysis domain. We also introduced the Web structural mining technique into the terrorist network analysis field which, to the best our knowledge, has never been used in this domain. More importantly, the results from our analysis provide not only insights for terrorism research community but also empirical implications that may help law-reinforcement, intelligence, and security communities to make our nation safer. © Springer-Verlag Berlin Heidelberg 2005.

Benjamin, V. A., & Chen, H. (2013). Machine learning for attack vector identification in malicious source code. IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics, 21-23.

Abstract:

As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code. © 2013 IEEE.