Y

Yu Xu

Scalable Graph Platform for Advanced Analytics and ML

San Francisco, CA, US

About

26 US patents (17 issued & 9 pending) in the areas of parallel computing, large scale data analysis, information retrieval, and data management. 13 published papers at top database conferences. Initiated and leading MapReduce R&D work in Teradata. Responsible for Teradata’s Hadoop MapReduce Roadmap. Initiated and created the Teradata connector for Hadoop which led to Teradata and Cloudera’s partnership Invented industrial-strength data skew solutions for parallel inner/outer joins. Program committee member of the VLDB 2011 Industrial, Applications, and Experience Track. Invited panelist on MapReduce and Data warehousing solutions at ACM International Workshop On Data Warehousing and OLAP 2010. Program Committee member of ACM International Workshop On Data Warehousing and OLAP 2010. Program Committee member of Internal Conference on Data Engineering (ICDE) 2006. Teradata Certified Master V2R5 Specialties: parallel computing, database management, Hadoop, MapReduce, large scale data analysis, data warehouse, information retrieval in semi-structured data, keyword search in XML documents, spatial databases, scalable web system development, search engine optimization, XML language and query engine, PigLatin, Cassadra, HBase, HIVE

Ask me about

Work experience

  1. May 2023 – present

    TigerGraph

    Founder and CTO
  2. January 2012 – May 2023

    TigerGraph

    Founder and CEO
    We’re hiring: System Engineers, Solution Architects, Solution Engineers, Sales Representatives, Marketing Director, Marketing Program Manager… http://www.tigergraph.com/join-us/
  3. January 2011 – January 2012

    Twitter

    Data Analytics Engineer
  4. January 2006 – January 2011

    Teradata

    Hadoop MapReduce architect and team leader
    26 patents (9 pending & 17 issued); 60 internal Invention Disclosures; 6 papers from work published at top database conferences. Many of the ideas I have come up with for optimizing parallel computing apply both to parallel DBMS and to Hadoop. Responsible for Teradata's MapReduce Roadmap and advanced R&D work to answer new marketing challenges and cutting-edge customer requirements. Project highlights Hadoop MapReduce I initiated the Hadoop/MapReduce research, prototyping, experiments in Teradata in early 2008 well before Teradata had any customer interests. My work was used to answer customer inquiries and provide marketing materials. Part of my work went to Teradata Developer articles and highlighted in Teradata's Partner's conference as a major part of Teradata's MapReduce strategy in 2009. In 2010, I made the Hadoop MapReduce part of the largest government benchmark project in Teradata possible by helping the benchmark group with Hadoop installation/tuning, and fixing Hadoop out of memory and data skew issues. I initiated the TeradataInputFormat approach to allow MapReduce programs direct parallel access to Teradata which formed the foundation for Teradata and Cloudera's partnership. created a Tweets/Retweets social analysis demo for Teradata Partners Conference 2011 using Twitter Streaming API lead and demonstrated two Teradata Hadoop demos at the 2010 Teradata Partners Conference. Data skew handling data skew has been a fundamental problem since the beginning of parallel DBMS because a node doing too much work slows down the whole system. Despite more than 30 years of extensive academia research and industry efforts, no effective skew handling mechanism had been implemented by any major parallel DBMS. I invented the PRPD algorithm (published in SIGMOD 2008), the first industrial-strength and cost-efficient data skew solution. Patents filed and papers published. The new PRPD join plan is available in production Teradata 14.
  5. Teradata
    January 2006 – January 2011

    Teradata

    Hadoop Architect
  6. January 2005 – January 2006

    IBM

    Staff Software Engineer
  7. January 2004 – January 2004

    IBM Almaden Research Center

    Research Intern
  8. January 2002 – January 2002

    AT&T Labs Research

    Research Intern
  9. January 2000 – January 2001

    Enosys

    Software Engineer

Education

  1. 1999 – 2005

    University of California, San Diego

    Ph.D, Computer Science