Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for enterprise data analytics. These services are motivated by the fact that setting up and running data analytics is a major hurdle for enterprises. Although platform as a service (PaaS), software as a service (SaaS), and more recently database as a service (DBaaS) have eased the pain of provisioning and scaling hardware and software infrastructures, users are still responsible for managing and tuning their servers. A job service mitigates this pain by offering server-less analytics capability that does not require users to provision and manage servers. Instead, the service provider takes care of managing and tuning a query engine that can scale instantly and on demand. Users can get started quickly using the all familiar SQL interface and pay only for the processing used for each query, in contrast to paying for the entire provisioned server infrastructure irrespective of the compute resources actually used.

At Microsoft, SCOPE is an analytics-as-a-service which is used for internal data analytics. SCOPE is deployed over hundreds of thousands of machines, running hundreds of thousands of production analytic jobs per day that are written by thousands of developers, processing several exabytes of data per day, and involving several hundred petabytes of I/O. SCOPE users are not required to manage or tune their hardware and software infrastructure, and they concentrate only on their processing logic. However, the shared nature of SCOPE job service across several users and teams leads to significant overlaps in partial computations, i.e., parts of the processing are duplicated across multiple jobs, thus generating redundant costs. The goal of CloudViews is to automatically detect and reuse overlapping computations in the SCOPE job service, while allowing users to write their jobs just as before, i.e., with zero changes to user scripts.

Talks and Posters
  • A Jindal, K Karanasos, HS Patel, S Rao Sriram
    Selection of Subexpressions to Materialize for Datacenter Scale
    US Patent App. 15/884,282

  • A Jindal, H Patel, Q Shi, J Di, MK Bag, Z Yin
    Computation Reuse in Analytics Job Service
    US Patent App. 15/952,347