AWS Data Exchange for Amazon Redshift

Jeff Barr, Chief Evangelist for AWS, is certainly excited about the new AWS Data Exchange purpose made for Amazon Redshift.

Back in 2019, Jeff introduced AWS Data Exchange and highlighted how you could find, subscribe to and use a wide range of data products. Over the last two years, Data Exchange has rapidly expanded to include over 3600 data products across the following ten categories:

• Financial Service
• Retail, Location & Marketing
• Public Sector
• Healthcare & Lifesciences
• Resources
• Media & Entertainment
• Telecommunications
• Manufacturing
• Automotive
• Gaming

With these comprehensive datasets, users could download them into an Amazon Simple Storage Service (Amazon S3) and have a wide variety of options for processing such as AWS Lambda functions, using an AWS Glue Crawler or Amazon Athena queries.

So how can this data exchange work within Amazon Redshift?

As a subscriber, you can now directly use data from providers without any further processing and no need for an Extract Transform Load (ETL) process! Because you don’t have to do any processing, the data is always current and can be used directly in your Amazon Redshift queries.

AWS Data Exchange for Amazon Redshift takes care of managing all entitlements and payments for you, with all charges billed to your AWS account. Super easy stuff!

Viewing subscribed data sets within AWS Data Exchange.

In the AWS Insight, Jeff takes viewers through two key vantage points, subscribing to a data product and publishing a data product. In Jeff’s own words:

“It was cool to realize just how many existing aspects of Redshift and Data Exchange played central roles. Because Redshift has a clean separation of storage and compute, along with built-in data sharing features, the data provider allocates and pays for storage and the data subscriber does the same for compute.

The provider does not need to scale their cluster in proportion to the size of their subscriber base, and can focus on acquiring and providing data.”

We couldn’t agree with you more Jeff. Not having to scale clusters in relation to subscriber sizing saves considerable costs as well as time.

Creating a new data set within AWS Data Exchange.

As Jeff highlights, multiple data providers are working to make their data products and sets available to users via AWS Data Exchange for Amazon Redshift.

Here’s just a few of the initial offerings:

FactSet Supply Chain Relationships – FactSet Revere Supply Chain Relationships data is built to expose business relationship interconnections among companies globally. This feed provides access to the complex networks of companies’ key customers, suppliers, competitors, and strategic partners, collected from annual filings, investor presentations, and press releases.

Foursquare Places 2021: New York City Sample – This trial dataset contains Foursquare’ss integrated Places (POI) database for New York City, accessible as a Redshift Data Share. Instantly load Foursquare’s Places data in to a Redshift table for further processing and analysis. Foursquare data is privacy-compliant, uniquely sourced, and trusted by top enterprises like Uber, Samsung, and Apple.

Mathematica Medicare Pilot Dataset – Aggregate Medicare HCC counts and prevalence by state, county, payer, and filtered to the diabetic population from 2017 to 2019.

COVID-19 Vaccination in Canada – This listing contains sample datasets for COVID-19 Vaccination in Canada data.

Revelio Labs Workforce Composition and Trends Data (Trial data) – Understand the workforce composition and trends of any company.

Facteus – US Card Consumer Payment – CPG Backtest – Historical sample from panel of SKU-level transaction detail from cash and card transactions across hundreds of Consumer-Packaged Goods sold at over 9,000 urban convenience stores and bodegas across the U.S.

Decadata Argo Supply Chain Trial Data – Supply chain data for CPG firms delivering products to US Grocery Retailers.

Ready to trial the features of AWS Data Exchange for Amazon Redshift?

If you’re looking to work with larger data sets and require the experience and guidance from a member of our data team, please get in touch below.