Hi — I’m Yash. I’m a software engineer in New York.
I like building things that make people faster and calmer: systems that don’t break when the inputs get messy,
and don’t require a detective board to debug.
Outside of work, I’m big on fitness, I’m always listening to music, and I travel a little aggressively.
Fun fact: I did 5 continents in 60 days. Next quest: scuba certified (so I can stop treating oceans
like “blue loading screens” on maps).
This site is intentionally more detailed than my resume. If you want the one-page version:
Resume PDF.
This is the timeline view. For deeper write-ups (the “how” and “why”), jump to Case Studies.
-
Bloomberg — Software Engineer → Senior Software Engineer (Jan 2021 – Present)
New York, NY, USA
I build production systems that turn streams, datasets, and long documents into workflows people can trust.
I care a lot about correctness, monitoring, and designing feedback loops so systems improve over time.
Dividend Projections Team (Signals + LLMs + Automation)
- Coverage framework + dashboard used across stakeholder groups.
- LLM workflows for cited summaries and structured extraction with strict numeric accuracy.
- Signals that prioritize reviews (weekly outliers + event triggers) and generate work items.
- Cross-listed automation: derivation logic + real-time event-driven updates.
Index Production Workflow Team (Data Pipelines + Quality)
- Automated ingestion workflows and anomaly detection/triage systems for large-scale production data.
-
American Family Insurance — Cloud Platform Engineer Intern (May 2020 – Aug 2020)
Madison, WI, USA
Built AWS automation (Lambda, CloudWatch, EC2) to improve incident response and reduce MTTR by 15%.
-
University of Illinois Urbana–Champaign — Graduate Teaching Assistant (CS225 Data Structures) (Jan 2020 – Dec 2020)
Urbana-Champaign, IL, USA
Designed and graded machine problems/exams for 900+ students; led weekly labs for 60+ students.
-
University of Illinois Urbana–Champaign — Graduate Research Assistant (FORWARD Data Lab, IBM/NSF-funded) (May 2019 – Dec 2019)
Urbana-Champaign, IL, USA
Research spanning entity-aware search and neural attention models for webpage understanding (including price detection).
-
Symbiosis Centre for Medical Image Analysis — Research Assistant (Mar 2018 – May 2019)
Pune, India
Worked on ML + neuroimaging; co-authored a paper in NeuroImage: Clinical (2019).
-
Symantec — Software Engineer Intern (Jan 2018 – Jun 2018)
Pune, India
Built an anti-phishing detection prototype for browser extensions using large-scale feature pipelines and supervised + semi-supervised ML models.
These are intentionally more detailed than a resume. The goal is to show how I think: how I define problems,
design systems, handle edge cases, and measure outcomes. I keep them high-level and avoid proprietary details.
1) Trustworthy LLM Document Summarization
Problem: People needed high-signal facts from long documents quickly, but plain summarization wasn’t reliable when numbers mattered.
Approach: Build a pipeline that produces auditable outputs: summaries + structured fields where every claim/number is anchored to source pages/chunks, and missing info is explicitly “not disclosed.”
- Chunking: overlap windows to preserve boundary context.
- Aggregation: per-chunk extraction → canonical merged output.
- Conflict resolution: when candidates disagree, resolve using source text.
- Quality: evaluation rubric (LLM-as-a-Judge) + regression checks to prevent drift.
- Workflow: integrated into Jira so the output lands where work already happens.
Outcome: ~45% faster processing and ~1,000 hours/year saved across ~11.5K documents/year.
2) Coverage Framework → Algorithms → Pipeline → Dashboard
Problem: “Coverage” was a shared metric across multiple stakeholders, but definitions varied and decisions became inconsistent.
Approach: Partner with domain experts to define measurable universes and states, implement time-aware classification for transitions, and ship a pipeline + dashboard that becomes the shared source of truth.
- Definitions: eligibility, universes, schedule states, and views (security/company).
- Time-aware logic: model transitions (initiations/expansions/drops/re-initiations).
- Golden dataset nucleus: canonical layer designed to expand over time.
- Stakeholder UX: dashboards/reports for product and customer-facing teams.
3) Cross-Listed Securities Automation
Problem: Related listings of the same company across exchanges required manual syncing and created correctness risk.
Approach: Design cross-market derivation logic and build an event-driven real-time service that keeps related listings synchronized automatically.
Outcome: increased automated coverage by ~17% (~3,000 securities).
4) Exchange Data Ingestion + Anomaly Detection Platform
Problem: Manual ingestion workflows create operational drag, and at scale, anomalies must be easy to detect and triage.
Approach: Automate ingestion, record anomalies, and create a triage loop that scales (including UI integration and a supporting data layer).
Outcome: reduced index calculation errors and saved >3 hours/week for Ops/SRE.
5) Signals That Prioritize Reviews
- Outliers: weekly anomaly detection over ~40K securities to flag unusual yield/earnings patterns and generate work items.
- Event triggers: automated routing when key events occur (dividend declarations, capital changes, earnings/AGM announcements, delistings, major news).
Placeholders for now. I’ll replace these with real project pages/links as I add them.
-
Project Placeholder #1
What it is: One-liner in plain English.
Why it exists: The itch it scratched / lesson learned.
Links: Code · Demo · Write-up
-
Project Placeholder #2
What it is: One-liner.
Links: Code · Demo
-
Project Placeholder #3
What it is: One-liner.
Links: Code