Prioritizing and Classifying 8-K Information Using XGBoost


Market operators and regulators need to work together to enable a smooth market functionality. Within this dynamic, regulators fill the role of monitoring announcements and stepping in when necessary. In Brazil, the regulating organization CVM has the power to halt the trading of a stock when related corporate announcements may cause undue market volatility. They base this decision on 8-K announcements issued for various corporate events including: bankruptcy, change in CEO, changes in dividend policy, and IPO’s. Unfortunately this laborious process requires regulators to sift through mountains of documents that may ultimately require no action. In order to improve this process this paper introduces a text classification system for the documents. This process allows regulators to examine the announcements that are most likely to require immediate action. We add to the existing text classification literature by comparing different classification techniques. We find that our XGBoost model performs better than previously used methods such as random forest, neural networks and naive bayes methods. To our knowledge this method has not been previously used for text classification within the accounting field. We believe it to be a superior classification methodology for future research applications.