Researchers show that stock price prediction models can be tricked by simply quote tweeting an influencer’s post on Twitter and altering a word or two.
Investors have always kept a close eye on the news, looking for any hint that a stock price is headed up or down. That search for a competitive edge has widened in recent years with the advent of machine-learning models that scan the web for clues that stocks are under or overvalued and should be bought or sold. But these predictive models may not be as trustworthy as they might seem, according to a new study1 at the natural language processing conference, NAACL.
“If you want to manipulate stock prices, you don’t need access to an investor’s model or data,” said Dakuo Wang, a researcher at IBM. “You just create a few hundred fake Twitter accounts, pretend to be an investor, and change a word or two when quote tweeting the CEO.”
Original tweets can’t be edited, but you can say whatever you want in a quote tweet. It’s this vulnerability that Wang and his colleagues exploit in a new technique for launching adversarial attacks on deep learning-based stock-prediction models. Their experiments may be the first to probe the weaknesses of financial models that base their forecasts, in part, on news gathered from social media.
Using publicly available data and prediction models, the researchers built a tool that can identify from a series of influencer tweets the one that appears easiest to attack. The tool then finds the word in the target tweet that’s most likely to flip the model and swaps it with a semantically similar word when it quote tweets the original. The substitute word is unlikely to raise any red flags among readers because of its similar meaning, but it causes the model to reverse its prediction.
After ingesting a doctored tweet, a model that might have predicted that a stock price was falling and suggested that investors sell, might reverse its decision, and instead nudge the investor to buy. “The fake retweet can fool the stock prediction model, but the human eye is unlikely to notice that anything has changed,” said IBM researcher and senior author of the study, Pin-Yu Chen.
If you want to manipulate stock prices, you don’t need access to an investor’s model or data.
The vulnerability of machine vision models to adversarial attacks is by now familiar. Change an object’s orientation or a subtle physical detail, and it’s possible to trick a top-performing classifier into making mistakes that most humans never would. Even straightforward people-detecting models can be fooled. In an earlier study,2 Chen showed that a t-shirt could double as an invisibility cloak simply by printing a model-deceiving pattern on the front.
Language models are increasingly being put under the microscope, too. The idea to test the security of text-analysis models used in financial trading came from the study’s first author, Yong Xie, a PhD student studying math and finance at University of Illinois Urbana-Champaign. From an earlier internship at a trading firm in China, Xie knew that these models are now commonly used by investors to detect market trends. During his recent internship through the IBM-Illinois Discovery Accelerator Institute, Xie proposed that the team see just how robust they were.
Xie and the team searched for an open-source stock-prediction model and found three — Stocknet, FinGRU, and FinLSTM. They then simulated attacks on the models and measured the impact on a fantasy $10,000 portfolio.
Under the researchers’ simulated “long-only buy-hold-sell” strategy, their portfolio dropped to $7,000 due to normal turbulence over the two-year stretch of stock market data they chose as a benchmark. When they factored in the Twitter attacks, their portfolio lost an additional $3,200, falling to $3,800, they found.
The models they attacked were created by academics, and probably not used directly in trading, the researchers said. However, the models are similar enough to those currently in use that the security flaws identified in the paper may apply. The researchers are making their code publicly available for investors to test their models.
“Machine learning brings new risks to financial decisions,” said Xie. “With pensions and college savings at stake, we need to understand where the vulnerabilities are and how to reduce them.”
The attack tool is among several open-source text-based tools that IBM is releasing this week. PrimeQA is the first software library to integrate algorithms for reading and responding to questions in more than 90 languages, and handle question-answering problems embedded in tables, photos, and video. Label Sleuth allows users with no machine-learning knowledge to build a customized text-classifier to quickly analyze large bodies of text.
Xie, Y., Wang, D., Chen, P., et al. A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction. arXiv. [Submitted on 1 May 2022 (v1), last revised 11 May 2022 (this version, v2)] ↩
Xu, K. et al. (2020). Adversarial T-Shirt! Evading Person Detectors in a Physical World. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_39 ↩