Chief Data Scientist

Selva Selvakkumaran.

Technical leader who thrives in building highly scalable systems, cloud based infrastructures, data (,science, search, analytics ) platforms, optimization algorithms and kick-ass teams.

Specialties: Architecting high performance and highly scalable multi-cloud infrastructures for both data and operations. Machine Learning, Artificial Intelligence (AI), Search (ElasticSearch), Classification, Regression, Natural language processing (NLP), Information Retrieval (IR), Relevance detection, Recommender systems,Optimization Algorithms

Service platform for companies requiring assistance in setting up computer vision stacks

Optimized platform to facilitate versioning of data along with models used to train the models for high IO-intensive loads (computer vision)
A content-Based Information Retrieval  (CBIR) system was built to identify and retrieve images by storing latent features as dense vectors in ElasticSearch (Approximate Nearest Neighbor search)
Mentored offshore team of software engineers and computer vision experts on system design, DevOps, and optimized data handling.

Co-Founder

Enabled team delivery of big data ingestion by improving the platform by making it robust to data skews and performance improvement of 22x

Recommendations on cross-region backup strategies for a large data repository.

Senior Alexa Engineer

Bootstrapped and mentored a team of seven full-stack data scientists/engineers, who are responsible for ML models in production, recommendation engines, data engineering, and analytics.

Introduced data lake concept to ServiceTitan and constructed one for hosting 100+ billion log records and trained the engineering department on big data tools (Redshift Spectrum). Saved half a million dollars in the first year alone.

Developed a scalable federated data warehouse to mirror production data from 3+ million tables into a Redshift cluster. This warehouse is widely used across the company.

Built ETL tools for data manipulation and in addition to the serverless microservices API platform to supply property data to production. 

Embedded PowerBI platform to supply intelligence to 5000+ ServiceTitan customers about their business metrics. In addition, a Metabase platform for analytics was built to assist the internal customer success department.

Filed two patents based on the team's work.

Member of the executive team responsible for big data platforms, data science, and infrastructure DevOps. Optimized user engagement, content organization, search and speed of delivery. 

As a #2 technical hire, along with CTO, been responsible for bootstrapping the teams responsible for data integration, cloud infrastructure, QA, business intelligence, and software development.

Built from scratch, a cost-efficient and secure infrastructure that served 438k requests per minute within an average of 20 ms in a global load-balanced infrastructure on AWS.

Lead and mentored the data science team in developing optimization solutions and scalable algorithms for petabyte-scale problems (tools: Hive, Spark, and Redshift).  

Developed optimal pricing strategies for better inventory utilization (win rates improved from single digits to well over 50%) and thereby reduced inventory costs on average by 31%. 

Spearheaded investigation, identification, and mitigation of fraudulent behavior in Mobile RTB traffic.

Worked with the product team in providing big data-informed road maps for new products and features.

Liaised with and provided guidance to the Business Analyst team to improve clicks and conversions and help keep key clients happy on an ongoing basis.

Identified infrastructure efficiencies and worked with the development team to implement them in a close loop. For example, preferential filtering strategies lead to a 42% reduction in AWS overhead.

Based on the contributions, I had been recognized as "Key Technical Talent" by Telenav.

Sr Data Scientist

Was solely responsible for data science and co-responsible for back-end systems.

Recommendations: Developed recommendation systems using an ensemble of Mahout algorithms (Mahout is ML on Hadoop). Adopted aspects of EdgeRank and other custom algorithms to measurably improve user engagement. Series of A/B tests were done to introduce trending videos, recommended videos, and recommended topics to the frequency inbox. 

NLP: named entity detection using common RDF data stores and other data services. The detected entities power both context-aware search and algorithms for related videos.

Search: Implemented a brand new ElasticSearch flow for mobile clients that integrates more domains than text and uses map-reduce style querying for vastly better results delivered with low latency.

Back-end Architecture:  Optimized various aspects of the back-end architecture that consists of java based framework utilizing various big data stores and caching/queuing layer in AWS cloud. Introduced elements of Lamda architecture for high performance and robustness. Developed horizontal sharding strategy for 100x scalability.

DevOps: Was responsible for optimizing a large AWS cluster for better performance and lower cost.

Sr Machine Learning Engineer

Optimized an existing backend platform to quadruple the monthly revenue to 4M, in addition to achieving 5X more robustness.
Designed and architected next-generation version of the same infrastructure to achieve ultimate robustness, flexibility, and performance.

Backend System Architect

•	Scalability Bottlenecks: Leading inter-departmental teams to resolve scalability bottlenecks that arise in hardware, software, and databases.
•	Hiring: Evaluate candidates applying for various positions for their technical knowledge.
•	Benchmarking: SAN configurations, Performance measurement using SQLIO, A/B testing of various algorithms, High throughput read/write to MySQL system, Experimentation of HBase/Hadoop, SSD deployment, etc.
•	Analytics: Furnish business intelligence reports. Also, build prototype tools that gather data from both internal sources and external sources and provide various metrics to the executive team.
•	Data Wrangling: Custom python tools for managing backend data.
•	Development: Maintain and enhance Linux-based network-related software. Ad-hoc scripting.
•	QA and Testing: Be the agnostic party for holistic testing of critical portions of production software. Also, dive down and find root causes of technical issues that plague the entire infrastructure.
•	Management: In addition to leading the research team, provide technical expertise and training to members of other departments

Director of Research

Distributed cross-platform systems development for a network traffic analyzer application, network forensics, and data warehousing. 
The network traffic analyzer application uses libnids in the back end to assemble TCP streams and utilizes python routines for inspection. This multi-core application is capable of handling multi-gigabit traffic. 
The data gathered from the application is hosted on sharded MySQL DB and on SQL Server DB.

Senior Developer

Congestion-aware implementation tools enable the use of chips with fewer metal layers, providing both the customers and the company significant cost savings. As part of the congestion reduction task force, I had been responsible for developing diagnostic tools that graphically examine the congestion profile of NCDs, instituting benchmarking for validating various proposed solutions, and for innovating placer algorithm modifications to alleviate congestion. 

Joined the global optimization group that develops specialized algorithms to provide bleeding-edge performance to customers for various flows tailed to metrics such as fMax, area, and power.  The initial project focused on improving fMax of area flow and runtime and I delivered 4% better fMax and 8% runtime reduction without any area penalty. Later, I focused on improving fMax flow. For fMax flow, we delivered 6% better fMax (ISE 11.1) on Syncplicity netlists when compared to the default non-global_opt flow.

Staff Software Engineer

Revamped data structures in million line C++ code, to improve memory consumption by 42.6% for large designs (available since ISE 9.1/9.2). My changes accounted for a third of the total memory savings reported. http://www.xilinx.com/prs_rls/2007/software/0786_ise92i.htm

Improved global placement and detail placement algorithms by algorithm re-factoring and other tunings to get 2X speedup of these algorithms. Larger designs show a greater percentage of improvement. The changes are memory neutral and QoR neutral on average.  (available since ISE 10.1). This improvement saves several hours for the customer in each run, allowing large design customers to make two iterations per day, rather than one.

Senior Software Engineer

Research Assistant -- Data Mining, Multilevel Clustering, Hypergraph Partitioning, Multi-Objective and Multi-Constraint handling in hypergraph partitioning and partitioning driven placement.
Teaching Assistant -- Data Structures and Algorithms, Internet Programming and Computer Architecture.

Research Assistant / Teaching Assistant

Monterey Design Systems had a highly integrated backend implementation tool (integrating synthesis, placement, and routing). The tool was multi-threaded and used a multi-level algorithm design for further speedup. I researched and invented some new clustering algorithms and metrics that reduced congestion by 32% on average.