Saumya Shah

M.S. in Computational Linguistics at UW · Summer Intern at Apple Siri · Google Summer of Code 2018
Nice to meet you! I have recently graduated from the University of Washington with a Master's in Computational Linguistics. I have a Bachelor's in Computer Engineering where I explored various domains in Artificial Intelligence. My interests lie in Software Development, Machine Learning and Natural Language Processing. During the summer of my Master's, I worked with Apple in the Siri Video team in Seattle as a Machine Learning and NLP Engineering Intern. Apart from this, I have also participated in Google Summer of Code 2018 and have 15 months of experience working with leading startups in India.

With these varied experiences, I also strongly believe in giving back to my alma mater. I am still heavily involved with a committee that I led in my undergrad called Unicode, that encourages budding CS sophomores and juniors to contribute to open source projects and find their calling in tech. This initiative has garnered great success, where many of our students receiving offers from international research internships, getting selected for Google Summer of Code and winning national-level hackathons.

In my spare time, I like to dabble in stuff far from the tech space, to conciously build an all-round personality. Since my move to the US, I have been a very avid cook and love whipping up new recipes and trying different cuisines. Lately, I have been very interested in calligraphy, sketching and bullet journaling. I am a big productivity and minimalism nerd, often reading up or trying out productivity tricks in an attempt to achieve a more balanced and mindful life.

Education

University of Washington

Master of Science in Computational Linguistics
3.80 in coursework
July 2019 - December 2020

Dwarkadas J. Sanghvi College of Engineering

Bachelor of Engineering in Computer Engineering
Mentor at DJ Unicode, Chairperson - DJ Literary Society: (2017-2018), Co-InfoTech Head - Association for Computing Machinery: (2016-2017)
July 2015 - June 2019

Experience

Machine Learning and NLP Engineering Intern

Apple

I was responsible for revamping and redesigning the Showtimes feature for Siri where a user can ask Siri about theaters nearby. I was responsible for the new workflow and implementation of the feature along with developing a new UI also compatible with the Siri Jr. interface. My code went into the Siri pipeline at the end of my internship. I also presented my work to the head of Siri Experience who was impressed that I developed a shippable product at the end of 10 weeks of my 12 week internship.

June 2020 - September 2020

Machine Learning Intern

Mysuru Consulting Group

Worked on projects in Machine Learning and Natural Language Processing. Created a probabilistic Gamma-Poisson classifier for record linkage which has now been taken into production. Also worked on supervised topic modeling and attribute-metric extraction to automate compliance checking of ESG reports.

December 2017 - September 2018

Student Developer at Google Summer of Code 2018

Free UK Genealogy

Worked with Free UK Genealogy where I developed a framework that performs Named Entity Recognition on data present in wills and probate books from 18th century England.

May 2018 - August 2018

Machine Learning Intern

i3systems India Pvt. Ltd.

Created Machine Learning and Deep Learning models for classification on a highly skewed dataset of identity cards

June 2017 - August 2018

Research Experience

A collection of articles, presentations or published articles.

Graduate Research Assistant - Family History Extraction using Clinical Narratives

A final year undergraduate project which aims at automating question generation for images and verifying the result using Visual Question Answering techniques.

December 2020

Journal Paper - Optimization of Rainwater Harvesting Sites using GIS

A simulation model that uses weather and soil data to optimize rainwater harvesting in a location and recommend construction of dams and canals using GIS.

January 2019

Creations

Automated Topic-Based Extractive Summarization System

My team and I built a summarization system which extracts sentences from multiple documents that best represent the topic as its summary. This was a part of my LING 573 course on NLP Systems.

Named Entity Recognition using BERT and ELMo

As a part of my LING 575 course on Neural Language Models, my team and I built a Named Entity Recognizer using BERT and ELMo

Automated Question Generation and Answer Verification using Digital Data

My final year undergraduate project which aims at automating question generation for images and verifying the result using Visual Question Answering techniques. Thus, jointly learning through these two tasks can bring improvements to Visual Question Generation (VQG) and also solves a significant issue of dataset annotation. The thesis proposes to integrate a VQG module with a VQA architecture to give an end-to-end architecture that performs self-evaluation of questions by creating QA pairs.

Gammalink

Built a Gamma-Poisson graphical model to present a supervised solution for fuzzy matching of record pairs aka record linkage.

Wait! There's more..

See all Creations for more examples!

Open Source Contributions

Probate Parsing

Built as a student developer for Google Summer of Code for the organization - Free UK Genealogy.I created an end-to-end system that could extract the text from probate books and seed them into a database with entities such as name, county, date, relationships, etc.

leadership

Senior Mentor

Unicode
I am passionate about the need for knowledge transfer and actively work with a student-run organization that I lead - DJ Unicode. It is an initiative to foster a culture of open-source development to students in their early years at college. I mentor students in backend web technologies and guide them in advanced CS subjects such as Machine Learning, Natural Language Processing. Unicode successfully completed 5 pilot projects (3 web-applications and 2 Android applications) in its first six months. Unicode has grown from 45 developers in its first year to 80 developers in its second term with 7 ongoing projects.
September 2017 - April 2019

Chairperson

Organized events such as quizzes, debates, MUNs and JAMS (Just-a-Minute)
October 2017 - October 2018

Skills

Languages and Operating Systems
  • Python
  • Java
  • JavaScript
  • linux
  • Ubuntu
  • Windows
  • git
Database Technologies
  • MySQL
  • PostgreSQL
  • MongoDB
  • SQLLite
  • Redis
Web Development
  • HTML5
  • CSS3
  • jQuery
  • Bootstrap
  • Django
Dev Ops
  • Heroku
  • Travis
  • AWS
  • Google Cloud Platform
Organization
  • Trello
  • Scrum
Tools and Frameworks
  • NumPy
  • Pandas
  • PyTorch
  • Keras
  • Scikit Learn
  • NLTK
  • OpenCV
  • Genism
  • SpaCy
  • Tesseract-OCR
  • ArcGIS
Nifty tech tag lists from Wouter Beeftink