Open to opportunities · New York, US

Aniket Abhishek Soni

Senior Data Engineer — Databricks · Snowflake · Cloud · AI/ML

7+ years designing enterprise-scale data pipelines, ML-powered analytics systems, and cloud-native infrastructure. Published researcher across 18+ papers, IEEE Senior Member, international speaker, and judge at MIT, IEEE, Globee, and MLH.

Location New York, US
Current Role Sr. Data Engineer @ Cognizant
Research Gate 18+ Papers Published
Years Experience
0
in data engineering & cloud
Records / Day
10M+
processed at enterprise scale
Associate Cloud Engineer
Google Cloud Certified
Data Engineer Associate
Databricks Certified
IEEE
Senior Member (SMIEEE) · Upsilon Pi Epsilon Member · PgUS Member · Sigma Xi Affiliate · 20+ peer reviews · IEEE Day 2025 Ambassador · Fellow SCRS · Royal Fellow IOASD · Life Member Computer Society of India
AFCEA 40 Under Forty 2024 Rising Stars 30 Channel Co. 2025 Speaker Google DevFest Bronx 2023 Judge MIT100k Pitch Judge Jesse H. Neal 2024–26 Speaker Google DevFest Bronx 2023 AFCEA 40 Under Forty 2024 Rising Stars 30 Channel Co. 2025 Judge MIT100k PITCH. ACCELERATE. LAUNCH Judge Globee Awards 2023 & 2025 Judge Jesse H. Neal 2024–26 Judge Globee Awards 2023 & 2025
Technical Stack

The Pipeline

Languages
Python
SQL
PySpark
Java
Go (Golang)
Scala
🗄️Platforms
Databricks
Snowflake
Delta Lake
PostgreSQL
MongoDB
SQL Server
🔄Pipelines
Apache Airflow
Spark Streaming
ETL / ELT
Data Observability
CI/CD Pipelines
GitLab / GitHub
☁️Cloud
AWS (S3, Glue, EC2)
GCP (ACE Certified)
Azure / OneLake
Docker
Kubernetes
IAM / KMS
📊Analytics
Power BI
Tableau
Matplotlib / Seaborn
Folium / Geoplot
KPI Dashboards
Scikit-learn
🧠Domains
Financial Data
Geospatial Analytics
NLP / Speech AI
Climate / NOAA
Data Governance
Healthcare Data
Career History

Experience

Scroll horizontally to explore
Mar 2021 — Present
Cognizant
Senior Data Engineer
📍 New York, US · Full-time
Designed and automated high-volume ETL pipelines processing 10M+ records/day from custodial, trading and reference systems
Built KPI dashboards and data models improving accuracy by 30% and reducing reporting errors by 37%
Developed Slack + Databricks alerting system cutting incident resolution time by 60%
Implemented governance frameworks improving onboarding efficiency by 30%
Analyzed healthcare datasets in Databricks; enhanced Power BI dashboards for clinical teams
Built SQL-based validation frameworks improving reconciliation rates by ~30%
Mentored junior engineers, reducing onboarding time by 30%
DatabricksSnowflakePySpark PythonPower BISQL GitLabMongoDB
Sep 2020 — Mar 2021
Climate Change Xplorers
Research Analyst
📍 New York, US · Part-time
Built ETL pipelines ingesting ~10K records/day from NOAA, FSI, MADIS, IBM Watson and GeoNames
Analyzed 150M+ data points to identify 151 optimal global IoT weather station locations
Designed interactive geospatial plots boosting rendering performance by 70% vs. static visualizations
Built a statistical model using FSI data across 8 categories for global station deployment optimization
Published research findings contributing to climate monitoring strategies
PythonNOAA/FSIFolium GeospatialPandasETL
Jul 2020 — Sep 2020
Converseon.AI
Software Engineer Intern
📍 New York, US · Part-time
Resolved 7+ JIRA tickets/week including critical backend bugs in a Python-Django computer vision platform
Built ETL cron-jobs to ingest Twitter API and YouTube API data into ElasticSearch for Tableau analysis
Optimized PostgreSQL queries, improving page load times on 2 critical pages
Created client FAQ documentation saving 4 dev hours/week
PythonDjangoPostgreSQL ElasticSearchTravis CITableau
Nov 2018 — May 2020
Southern Arkansas Univ.
Graduate & Teaching Assistant
📍 Magnolia, AR, US
Maintained Honors College web server and developed academic webpages for operational support
Managed library inventory, student records, and database troubleshooting at Magale Library
Provided front-line IT support to students and faculty across campus
Assisted in student recruiting, records management, and faculty meeting preparation
Web DevDB AdminIT Support
Jan 2018 — Jun 2018
Alok Enterprise
Computer Programmer & Data Analyst
📍 Ahmedabad, India · Full-time
Developed and maintained small-scale .NET applications for day-to-day business operations
Analyzed sales, inventory and service data in Excel to improve business decision-making
Automated routine reporting and data entry tasks, improving accuracy and reducing manual workload
Tested in-house apps with cross-functional teams to ensure stable releases
.NETASP.NETMySQL ExcelHTML/CSS
Portfolio

Some Projects

01
ML · Generative AI · Audio
Music Mood App
Built for Google AI's Music Mood App Competition. Developed a full ML pipeline using generative AI for mood-based music classification. Designed an interactive UI that boosted user engagement by 40%, with an 85% classification accuracy — a 30% improvement over baseline mood detection models.
85% ML Accuracy +40% Engagement
Aug 2024 · Google AI Competition
02
NLP · Speech · Python
Offline Speech-to-Text (Vosk)
High-accuracy offline transcription tool using Vosk + Python. Integrated KaldiRecognizer with pre-trained models, automated audio-to-text export as DOCX, and eliminated 100% of manual transcription effort.
100% Manual Effort Saved Offline-First
May 2024 · arXiv Published
03
Web · Air Quality · Django
Air Quality Web App
Real-time AQI lookup by ZIP code built with Python-Django backend and Bootstrap frontend. Provides instant air quality insights via intuitive search interface.
Jun 2020
04
Data Analysis · Public Health · Python
COVID-19 Data Analysis
Pulled Johns Hopkins COVID-19 data, analyzed it with Pandas, and visualized with Plotly + Folium. Compared countries by social distancing adherence, government response, healthcare capacity, and testing feasibility. One of several global comparisons during peak pandemic uncertainty.
John Hopkins DB Plotly + Folium
May 2020
05
NLP · Twitter API · ML
Twitter Sentiment Analysis
Live tweet ingestion via Twitter API → PostgreSQL → Scikit-learn Naive Bayes model. Achieved >80% AUC across positive, negative, and neutral classes with manually annotated training data.
>80% AUC
Apr 2020
06
.NET · Healthcare · ERP
Medicines Stock Management System
Full-stack .NET application with prescription management, real-time dashboards, automated email/SMS alerts to distributors, and smart low-stock threshold detection. Led front-end and business logic development.
May 2016 · Capstone Project
Academic Output

Research & Publications

Type Title & Venue Year DOI / ID
Book
Scalable Infrastructure: Building Reliable Distributed Systems
BP International · ISBN 978-93-49970-00-7
2025 10.9734/bpi
Book
Building Organizational Intelligence Using Digital Twins and Generative AI (Chapter)
Wiley-Scrivener Publishing · In publishing process
2025
Conf.
Big Data Workload Profiling for Energy-Aware Cloud Resource Management
DASET · Scotland
Jan 2026
Conf.
Reinforcement Learning For Dynamic Workflow Optimization In CI/CD Pipelines
17th IEEE CICN
Dec 2025 IEEE
Conf.
Data Foundations for a Successful AI Strategy: A Blueprint for AI-Ready Enterprises
DASA · Bahrain
Dec 2025
Conf.
Detection of Advance Malware Threats using Hybrid Deep Learning Model and Image Analysis
IC3 (International Conference on Contemporary Computing)
Aug 2025 10.1109/IC3...
Conf.
Dynamic Context Tuning for Enhanced Multi-Turn Planning in Retrieval-Augmented Generation Systems
ICE2CPT
Oct 2025 arXiv:2506.11092
Conf.
Combining Threat Intelligence with IoT Scanning to Predict Cyber Attacks
AIBThings (3rd International Conference)
Sept 2025 arXiv:2411.17931
arXiv
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
arXiv [cs.SD]
Mar 2025 arXiv:2503.21025
Journal
Edge Vs Cloud Computing Performance Trade-Offs for Real-Time Analytics
IJSEA · Vol. 14, Issue 06
Jun 2025 10.7753/IJSEA
Journal
Self-Healing Data Pipelines: A Fault-Tolerant Approach to ETL Workflows
IJETRM · Vol. 14, Issue 05
May 2025 zenodo.15615306
Journal
Dynamic Resource Allocation in Serverless Architectures using AI-Based Forecasting
IJERT · Vol. 14, Issue 04
Apr 2025 zenodo.18104614
→ Full list on Google Scholar · ORCiD · ResearchGate
Awards & Community

Recognition

2025
Rising Stars 30
The Channel Co. Computing
2024
40 Under Forty
AFCEA
2023
Young Achievers' Award
Indian Achievers' Forum
2022
Cheers Award
Cognizant
Speaking & Judging
Speaker
Google DevFest Bronx 2023 — The Tech Mentor's Journey: Navigating Challenges and Shaping Futures
Judge
MIT 100k Pitch & Accelerate Competition
Judge
Globee Awards — AI, Business & Technology categories (2023, 2025)
Judge
Jesse H. Neal Awards (2024, 2025, 2026)
Judge
Major League Hacking — Open Source, Web3Apps, AI Hackfest, SummerCodex 2025
Judge
Business Intelligence Group — Sammy Awards (2025)
Peer Reviewer
IEEE — ICETEG, RCAAI, ICoMMS, AGRETA, ISWTA, NMITCON, SSITCON, PuneCon (2025)
Ambassador
IEEE Day 2025 · AnitaB.org Grace Hopper Celebration #SpecSquad 2025
Editor
Associate Editor — International Journal of Engineering in Computer Science
Professional Memberships
Senior Member IEEE (SMIEEE) Upsilon Pi Epsilon Member · UPE PgUS SigmaXi Affiliate Circle Computer Society IEEE AAAS Royal Fellow · IOASD Fellow Member · SCRS Lifetime Member · CSI GDG New York
Volunteering & Mentorship
Mentor · Freedom Employability Academy (India) Mentor · ENGin Ukraine Volunteer · Cognizant Outreach (100+ hrs) Mentor · TopMate.io
Academic Background

Education

🎓
Master of Science in Computer Science
Southern Arkansas University
Aug 2018 — May 2020
Magnolia, AR, USA
🏛️
Bachelor of Engineering in Computer Engineering
Gujarat Technological University
Aug 2011 — Jun 2016
India
Let's Build Something
GET IN TOUCH

Open to senior data engineering roles, research collaborations, speaking invitations, judging opportunities, and mentoring.