what-is-data-scraping-and-how-can-you-use-it?
What Is Data Scraping?
Data scraping, also known as web scraping, is the process of importing information from a website into a spreadsheet or local file saved on your computer. It’s one of the most efficient ways to get data from the web, and in some cases to channel that data to another website. Popular uses of data scraping include:
- Research for web content/business intelligence
- Pricing for travel booker sites/price comparison sites
- Finding sales leads/conducting market research by crawling public data sources (e.g. Yell and Twitter)
- Sending product data from an e-commerce site to another online vendor (e.g. Google Shopping)
And that list’s just scratching the surface. Data scraping has a vast number of applications – it’s useful in just about any case where data needs to be moved from one place to another.
The basics of data scraping are relatively easy to master. Let’s go through how to set up a simple data scraping action using Excel.
Data Scraping with dynamic web queries in Microsoft Excel
Setting up a dynamic web query in Microsoft Excel is an easy, versatile data scraping method that enables you to set up a data feed from an external website (or multiple websites) into a spreadsheet.
Automated data scraping with tools
Getting to grips with using dynamic web queries in Excel is a useful way to gain an understanding of data scraping. However, if you intend to use data regularly scraping in your work, you may find a dedicated data scraping tool more effective.
Here are our thoughts on a few of the most popular data scraping tools on the market:
Data Scraper (Chrome plugin)
Data Scraper slots straight into your Chrome browser extensions, allowing you to choose from a range of ready-made data scraping “recipes” to extract data from whichever web page is loaded in your browser.
This tool works especially well with popular data scraping sources like Twitter and Wikipedia, as the plugin includes a greater variety of recipe options for such sites.
We tried Data Scraper out by mining a Twitter hashtag, “#jourorequest”, for PR opportunities, using one of the tool’s public recipes. Here’s a flavour of the data we got back: