Yash Saboo

About

Hi — I’m Yash. I’m a software engineer in New York.

I like building things that make people faster and calmer: systems that don’t break when the inputs get messy, and don’t require a detective board to debug.

Outside of work, I’m big on fitness, I’m always listening to music, and I travel a little aggressively. Fun fact: I did 5 continents in 60 days. Next quest: scuba certified (so I can stop treating oceans like “blue loading screens” on maps).

This site is intentionally more detailed than my resume. If you want the one-page version: Resume PDF.

Highlights

LLM document summarization: ~45% faster processing and ~1,000 hours/year saved across ~11.5K documents/year.
Cross-listed automation: increased automated coverage by ~17% (~3,000 securities) with real-time synchronization.
Signal-driven prioritization: weekly detection + routing over ~40K securities to generate actionable work items.
Anomaly workflows at scale: built systems designed around triaging anomalies across 100M+ daily data points.

Experience

This is the timeline view. For deeper write-ups (the “how” and “why”), jump to Case Studies.

Bloomberg — Software Engineer → Senior Software Engineer (Jan 2021 – Present)
New York, NY, USA

I build production systems that turn streams, datasets, and long documents into workflows people can trust. I care a lot about correctness, monitoring, and designing feedback loops so systems improve over time.

Dividend Projections Team (Signals + LLMs + Automation)
- Coverage framework + dashboard used across stakeholder groups.
- LLM workflows for cited summaries and structured extraction with strict numeric accuracy.
- Signals that prioritize reviews (weekly outliers + event triggers) and generate work items.
- Cross-listed automation: derivation logic + real-time event-driven updates.
Index Production Workflow Team (Data Pipelines + Quality)
- Automated ingestion workflows and anomaly detection/triage systems for large-scale production data.

American Family Insurance — Cloud Platform Engineer Intern (May 2020 – Aug 2020)
Madison, WI, USA
Built AWS automation (Lambda, CloudWatch, EC2) to improve incident response and reduce MTTR by 15%.

University of Illinois Urbana–Champaign — Graduate Teaching Assistant (CS225 Data Structures) (Jan 2020 – Dec 2020)
Urbana-Champaign, IL, USA
Designed and graded machine problems/exams for 900+ students; led weekly labs for 60+ students.

University of Illinois Urbana–Champaign — Graduate Research Assistant (FORWARD Data Lab, IBM/NSF-funded) (May 2019 – Dec 2019)
Urbana-Champaign, IL, USA
Research spanning entity-aware search and neural attention models for webpage understanding (including price detection).

Symbiosis Centre for Medical Image Analysis — Research Assistant (Mar 2018 – May 2019)
Pune, India
Worked on ML + neuroimaging; co-authored a paper in NeuroImage: Clinical (2019).

Symantec — Software Engineer Intern (Jan 2018 – Jun 2018)
Pune, India
Built an anti-phishing detection prototype for browser extensions using large-scale feature pipelines and supervised + semi-supervised ML models.

Case Studies

These are intentionally more detailed than a resume. The goal is to show how I think: how I define problems, design systems, handle edge cases, and measure outcomes. I keep them high-level and avoid proprietary details.

1) Trustworthy LLM Document Summarization

Problem: People needed high-signal facts from long documents quickly, but plain summarization wasn’t reliable when numbers mattered.

Approach: Build a pipeline that produces auditable outputs: summaries + structured fields where every claim/number is anchored to source pages/chunks, and missing info is explicitly “not disclosed.”

Chunking: overlap windows to preserve boundary context.
Aggregation: per-chunk extraction → canonical merged output.
Conflict resolution: when candidates disagree, resolve using source text.
Quality: evaluation rubric (LLM-as-a-Judge) + regression checks to prevent drift.
Workflow: integrated into Jira so the output lands where work already happens.

Outcome: ~45% faster processing and ~1,000 hours/year saved across ~11.5K documents/year.

2) Coverage Framework → Algorithms → Pipeline → Dashboard

Problem: “Coverage” was a shared metric across multiple stakeholders, but definitions varied and decisions became inconsistent.

Approach: Partner with domain experts to define measurable universes and states, implement time-aware classification for transitions, and ship a pipeline + dashboard that becomes the shared source of truth.

Definitions: eligibility, universes, schedule states, and views (security/company).
Time-aware logic: model transitions (initiations/expansions/drops/re-initiations).
Golden dataset nucleus: canonical layer designed to expand over time.
Stakeholder UX: dashboards/reports for product and customer-facing teams.

3) Cross-Listed Securities Automation

Problem: Related listings of the same company across exchanges required manual syncing and created correctness risk.

Approach: Design cross-market derivation logic and build an event-driven real-time service that keeps related listings synchronized automatically.

Outcome: increased automated coverage by ~17% (~3,000 securities).

4) Exchange Data Ingestion + Anomaly Detection Platform

Problem: Manual ingestion workflows create operational drag, and at scale, anomalies must be easy to detect and triage.

Approach: Automate ingestion, record anomalies, and create a triage loop that scales (including UI integration and a supporting data layer).

Outcome: reduced index calculation errors and saved >3 hours/week for Ops/SRE.

5) Signals That Prioritize Reviews

Outliers: weekly anomaly detection over ~40K securities to flag unusual yield/earnings patterns and generate work items.
Event triggers: automated routing when key events occur (dividend declarations, capital changes, earnings/AGM announcements, delistings, major news).

Personal Projects

Placeholders for now. I’ll replace these with real project pages/links as I add them.

Project Placeholder #1
What it is: One-liner in plain English.
Why it exists: The itch it scratched / lesson learned.
Links: Code · Demo · Write-up

Project Placeholder #2
What it is: One-liner.
Links: Code · Demo

Project Placeholder #3
What it is: One-liner.
Links: Code

Talks

PyCon Italy 2024 — “Code More, Draw Less, and Debug Just a Little!”
Event page

PyCon Lithuania 2023 — “Code More, Draw Less, and Debug Just a Little!”
Link (placeholder)

Skills

Software & Data Engineering: Python; Java; Kafka; REST APIs; Airflow; SQL; MongoDB; S3/Parquet; Lucene; Neo4j; Docker; Jenkins; Git; Splunk; Humio
LLM / AI Systems: LangChain; Vector DBs; RAG (chunking/aggregation); LLM evaluation (LLM-as-a-Judge)

Education

University of Illinois at Urbana–Champaign (2019 – 2020)
M.S. Computer Science · Merit-based full tuition waiver through TA/RA.

Symbiosis International University (2015 – 2019)
B.Tech. Computer Science & Engineering · Dept Rank: 2 · Academic Excellence Award.

Publications

Sumeet Shinde, Shweta Prasad, Yash Saboo, et al., "Predictive markers for Parkinson's disease using deep neural nets on neuromelanin sensitive MRI" , NeuroImage: Clinical (2019)

Leadership

Leadership (Head roles): Editorial Board (CS/IT); TechFest (Advisory); REVERB (Student Relations).

Volunteer

Consultant, Illinois Business Consulting (IBC) (2019)
Senior Student Mentor, Symbiosis Mentor–Mentee Committee (2017 – 2018)

Arts

Drama Team: skit on interdisciplinary skills (Symbiosis Inauguration Programme)
Dance Team: 2nd place at Flash Mob competition (Season's Mall, Pune)

Contact

Email: YOUR_PERSONAL_EMAIL
LinkedIn: linkedin.com/in/yashsaboo99
GitHub: github.com/yashsaboo
Resume: Download PDF

Yash Saboo

About

Highlights

Experience

Case Studies

1) Trustworthy LLM Document Summarization

2) Coverage Framework → Algorithms → Pipeline → Dashboard

3) Cross-Listed Securities Automation

4) Exchange Data Ingestion + Anomaly Detection Platform

5) Signals That Prioritize Reviews

Personal Projects

Talks

Skills

Education

Publications

Leadership

Volunteer

Arts

Contact

Yash Saboo

Now

Building production LLM pipelines

PyCon Italy 2024

Quick Links