Published: 2025-12-01

Apache Spark for Business and Financial Data Engineering: A Systematic Literature Review

DOI: 10.35870/ijsecs.v5i3.5419

Front Cover IJSECS VOLUME 5 NOMOR 3 DESEMBER 2025

Downloads

Article Metrics
Share:

Abstract

This paper is an SLR that maps the application of Apache Spark in data engineering in the business and finance domains. Practitioners and researchers alike would find it interesting to know how Apache Spark has been applied to solve big data problems as organizations continue to deal with large volumes of data. By analyzing publications from the Scopus database for 2021-2025, we try to find trends and methodologies currently in use as well as gaps in research existing in the field. It was found that Apache Spark is primarily used for sentiment analysis and trend analysis on social media, particularly Twitter, since its real-time processing capability can help understand market dynamics and consumer behavior. The platform carries out predictive tasks like predicting customer churn or pricing financial assets (stocks, bonds, options), proving its versatility across different business applications. Also, this platform is popular for anomaly detection such as transaction fraud with efficiency and cost being the main drivers of adoption. The landscape is not monolithic since some studies propose alternative platforms indicating that Apache Spark may not be the best option for every scenario. Based on our findings, we suggest future research directions that would push the boundaries of the field: using social media data sources other than Twitter for more general market sentiment, applying more varied algorithms to improve prediction accuracy, and extending Spark's application into new areas like currency exchange rate forecasting, credit risk analysis, Anti-Money Laundering (AML) detection as well as Data Lakehouse architecture implementation. These recommendations are meant to steer researchers toward uncharted territories where significant value could be unlocked for business and finance with the help of Apache Spark.

Keywords

Apache Spark ; Business ; Finance ; Data Engineering ; Systematic Literature Review (SLR)

Peer Review Process

This article has undergone a double-blind peer review process to ensure quality and impartiality.

Indexing Information

Discover where this journal is indexed at our indexing page to understand its reach and credibility.

Open Science Badges

This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges by sharing data, materials, or preregistered studies.

Similar Articles

You may also start an advanced similarity search for this article.