Hi — I’m Yash. I’m a software engineer in New York.

I like building things that make people faster and calmer: systems that don’t break when the inputs get messy, and don’t require a detective board to debug.

Outside of work, I’m big on fitness, I’m always listening to music, and I travel a little aggressively. Fun fact: I did 5 continents in 60 days. Next quest: scuba certified (so I can stop treating oceans like “blue loading screens” on maps).

This site is intentionally more detailed than my resume. If you want the one-page version: Resume PDF.

  • LLM document summarization: ~45% faster processing and ~1,000 hours/year saved across ~11.5K documents/year.
  • Cross-listed automation: increased automated coverage by ~17% (~3,000 securities) with real-time synchronization.
  • Signal-driven prioritization: weekly detection + routing over ~40K securities to generate actionable work items.
  • Anomaly workflows at scale: built systems designed around triaging anomalies across 100M+ daily data points.

This is the timeline view. For deeper write-ups (the “how” and “why”), jump to Case Studies.

  • Bloomberg — Software Engineer → Senior Software Engineer (Jan 2021 – Present)
    New York, NY, USA

    I build production systems that turn streams, datasets, and long documents into workflows people can trust. I care a lot about correctness, monitoring, and designing feedback loops so systems improve over time.

    Dividend Projections Team (Signals + LLMs + Automation)
    • Coverage framework + dashboard used across stakeholder groups.
    • LLM workflows for cited summaries and structured extraction with strict numeric accuracy.
    • Signals that prioritize reviews (weekly outliers + event triggers) and generate work items.
    • Cross-listed automation: derivation logic + real-time event-driven updates.
    Index Production Workflow Team (Data Pipelines + Quality)
    • Automated ingestion workflows and anomaly detection/triage systems for large-scale production data.

  • American Family Insurance — Cloud Platform Engineer Intern (May 2020 – Aug 2020)
    Madison, WI, USA
    Built AWS automation (Lambda, CloudWatch, EC2) to improve incident response and reduce MTTR by 15%.

  • University of Illinois Urbana–Champaign — Graduate Teaching Assistant (CS225 Data Structures) (Jan 2020 – Dec 2020)
    Urbana-Champaign, IL, USA
    Designed and graded machine problems/exams for 900+ students; led weekly labs for 60+ students.

  • University of Illinois Urbana–Champaign — Graduate Research Assistant (FORWARD Data Lab, IBM/NSF-funded) (May 2019 – Dec 2019)
    Urbana-Champaign, IL, USA
    Research spanning entity-aware search and neural attention models for webpage understanding (including price detection).

  • Symbiosis Centre for Medical Image Analysis — Research Assistant (Mar 2018 – May 2019)
    Pune, India
    Worked on ML + neuroimaging; co-authored a paper in NeuroImage: Clinical (2019).

  • Symantec — Software Engineer Intern (Jan 2018 – Jun 2018)
    Pune, India
    Built an anti-phishing detection prototype for browser extensions using large-scale feature pipelines and supervised + semi-supervised ML models.

These are intentionally more detailed than a resume. The goal is to show how I think: how I define problems, design systems, handle edge cases, and measure outcomes. I keep them high-level and avoid proprietary details.

1) Trustworthy LLM Document Summarization

Problem: People needed high-signal facts from long documents quickly, but plain summarization wasn’t reliable when numbers mattered.

Approach: Build a pipeline that produces auditable outputs: summaries + structured fields where every claim/number is anchored to source pages/chunks, and missing info is explicitly “not disclosed.”

  • Chunking: overlap windows to preserve boundary context.
  • Aggregation: per-chunk extraction → canonical merged output.
  • Conflict resolution: when candidates disagree, resolve using source text.
  • Quality: evaluation rubric (LLM-as-a-Judge) + regression checks to prevent drift.
  • Workflow: integrated into Jira so the output lands where work already happens.

Outcome: ~45% faster processing and ~1,000 hours/year saved across ~11.5K documents/year.


2) Coverage Framework → Algorithms → Pipeline → Dashboard

Problem: “Coverage” was a shared metric across multiple stakeholders, but definitions varied and decisions became inconsistent.

Approach: Partner with domain experts to define measurable universes and states, implement time-aware classification for transitions, and ship a pipeline + dashboard that becomes the shared source of truth.

  • Definitions: eligibility, universes, schedule states, and views (security/company).
  • Time-aware logic: model transitions (initiations/expansions/drops/re-initiations).
  • Golden dataset nucleus: canonical layer designed to expand over time.
  • Stakeholder UX: dashboards/reports for product and customer-facing teams.

3) Cross-Listed Securities Automation

Problem: Related listings of the same company across exchanges required manual syncing and created correctness risk.

Approach: Design cross-market derivation logic and build an event-driven real-time service that keeps related listings synchronized automatically.

Outcome: increased automated coverage by ~17% (~3,000 securities).


4) Exchange Data Ingestion + Anomaly Detection Platform

Problem: Manual ingestion workflows create operational drag, and at scale, anomalies must be easy to detect and triage.

Approach: Automate ingestion, record anomalies, and create a triage loop that scales (including UI integration and a supporting data layer).

Outcome: reduced index calculation errors and saved >3 hours/week for Ops/SRE.


5) Signals That Prioritize Reviews

  • Outliers: weekly anomaly detection over ~40K securities to flag unusual yield/earnings patterns and generate work items.
  • Event triggers: automated routing when key events occur (dividend declarations, capital changes, earnings/AGM announcements, delistings, major news).

Placeholders for now. I’ll replace these with real project pages/links as I add them.

  • Project Placeholder #1
    What it is: One-liner in plain English.
    Why it exists: The itch it scratched / lesson learned.
    Links: Code · Demo · Write-up

  • Project Placeholder #2
    What it is: One-liner.
    Links: Code · Demo

  • Project Placeholder #3
    What it is: One-liner.
    Links: Code
  • PyCon Italy 2024 — “Code More, Draw Less, and Debug Just a Little!”
    Event page

  • PyCon Lithuania 2023 — “Code More, Draw Less, and Debug Just a Little!”
    Link (placeholder)
  • Software & Data Engineering: Python; Java; Kafka; REST APIs; Airflow; SQL; MongoDB; S3/Parquet; Lucene; Neo4j; Docker; Jenkins; Git; Splunk; Humio
  • LLM / AI Systems: LangChain; Vector DBs; RAG (chunking/aggregation); LLM evaluation (LLM-as-a-Judge)
  • University of Illinois at Urbana–Champaign (2019 – 2020)
    M.S. Computer Science · Merit-based full tuition waiver through TA/RA.

  • Symbiosis International University (2015 – 2019)
    B.Tech. Computer Science & Engineering · Dept Rank: 2 · Academic Excellence Award.
  • Leadership (Head roles): Editorial Board (CS/IT); TechFest (Advisory); REVERB (Student Relations).
  • Consultant, Illinois Business Consulting (IBC) (2019)
  • Senior Student Mentor, Symbiosis Mentor–Mentee Committee (2017 – 2018)
  • Drama Team: skit on interdisciplinary skills (Symbiosis Inauguration Programme)
  • Dance Team: 2nd place at Flash Mob competition (Season's Mall, Pune)