Hsinchun Chen
Publications
Abstract:
Credit ratings convey credit risk information to participants in financial markets, including investors, issuers, intermediaries, and regulators. Accurate credit rating information plays a crucial role in supporting sound financial decision-making processes. Most previous studies on credit rating modeling are based on accounting and market information. Text data are largely ignored despite the potential benefit of conveying timely information regarding a firm's outlook. To leverage the additional information in news full-text for credit rating prediction, we designed and implemented a news full-text analysis system that provides firm-level coverage, topic, and sentiment variables. The novel topic-specific sentiment variables contain a large fraction of missing values because of uneven news coverage. The missing value problem creates a new challenge for credit rating prediction approaches. We address this issue by developing a missingtolerant multinomial probit (MT-MNP) model, which imputes missing values based on the Bayesian theoretical framework. Our experiments using seven and a half years of real-world credit ratings and news full-text data show that (1) the overall news coverage can explain future credit rating changes while the aggregated news sentiment cannot; (2) topic-specific news coverage and sentiment have statistically significant impact on future credit rating changes; (3) topic-specific negative sentiment has a more salient impact on future credit rating changes compared to topic-specific positive sentiment; (4) MT-MNP performs better in predicting future credit rating changes compared to support vector machines (SVM). The performance gap as measured by macroaveraging F-measure is small but consistent. © 2012 ACM.
Abstract:
The Internet which has enabled global businesses to flourish has become the very same channel for mushrooming 'terrorist news networks.' Terrorist organizations and their sympathizers have found a cost-effective resource to advance their courses by posting high-impact Websites with short shelf-lives. Because of their evanescent nature, terrorism research communities require unrestrained access to digitally archived Websites to mine their contents and pursue various types of analyses. However, organizations that specialize in capturing, archiving, and analyzing Jihad terrorist Websites employ different, manual-based analyses techniques that are inefficient and not scalable. This study proposes the development of automated or semi-automated procedures and systematic methodologies for capturing Jihad terrorist Website data and its subsequent analyses. By analyzing the content of hyperlinked terrorist Websites and constructing visual social network maps, our study is able to generate an integrated approach to the study of Jihad terrorism, their network structure, component clusters, and cluster affinity. © Springer-Verlag Berlin Heidelberg 2005.
Abstract:
This paper describes the necessity for government agencies to share data as well as obstacles to overcome in order to achieve information sharing. We study two domains: law enforcement and disease informatics. Some of the ways in which we were able to overcome the obstacles, such as data security and privacy issues, are explained. We conclude by highlighting the lessons learned while working towards our goals. © Springer-Verlag Berlin Heidelberg 2004.
Abstract:
It is very important for us to understand the functions and structures of terrorist networks to win the battle against terror. However, previous studies of terrorist network structure have generated little actionable results. This is mainly due to the difficulty in collecting and accessing reliable data and the lack of advanced network analysis methodologies in the field. To address these problems, we employed several advance network analysis techniques ranging from social network analysis to Web structural mining on a Global Salafi Jihad network dataset collected through a large scale empirical study. Our study demonstrated the effectiveness and usefulness of advanced network techniques in terrorist network analysis domain. We also introduced the Web structural mining technique into the terrorist network analysis field which, to the best our knowledge, has never been used in this domain. More importantly, the results from our analysis provide not only insights for terrorism research community but also empirical implications that may help law-reinforcement, intelligence, and security communities to make our nation safer. © Springer-Verlag Berlin Heidelberg 2005.
Abstract:
As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code. © 2013 IEEE.