Document Type

News Article

Publication Date

Summer 5-22-2026

Abstract

Financial fraud detection is a high-volume, high-velocity analytics problem. Traditional rule-based systems are often easy to deploy, but they are limited by static thresholds, delayed response, high false-positive rates, and weak explainability. This report presents a formalized end-to-end Big Data architecture for real-time fraud and anomaly detection in financial transaction streams.

The proposed architecture ingests transaction events through AWS Kinesis, enriches them through an Apache Flink stream-processing layer, scores them with an XGBoost classifier, explains model outputs using SHAP, and converts structured evidence into human-readable summaries through a controlled GPT explanation layer. Results are persisted through a hybrid storage strategy using MongoDB for semi-structured fraud events and PostgreSQL for structured business, audit, and compliance records. Grafana dashboards and reports expose outputs to analysts and managers.

The machine-learning evaluation demonstrates very strong fraud recall (99.57%) and ROC-AUC (0.9997), indicating that the model identifies nearly all fraudulent transactions in the experimental PaySim setting. However, precision is substantially lower (19.56%), creating a clear operational trade-off: the system catches fraud aggressively but produces many alerts requiring review. This report therefore recommends threshold tuning, alert prioritization, and cost-sensitive evaluation before any production deployment.

The main contribution of the project is not a standalone fraud classifier. It is a complete, explainable, and operationally oriented architecture that connects real-time data movement, model-based scoring, interpretability, natural-language reporting, storage, monitoring, and governance. Current low-code AI platforms can accelerate dashboards, workflows, alerting, and prototype interfaces, but they cannot fully replace the specialized streaming, custom modeling, explainability, and governance requirements of the proposed system.

Program or Discipline Name

Computer and Information Sciences

Share

COinS