Web Scraping With Node.js

Web scraping is a computer software technique of extracting information from websites. As the volume of data on the internet has increased, webscraping is more in demand and a number of services have emerged for it. But the majority of them are costly, limited or have other disadvantages. So you can use Node.js to solve these problems which is a versatile and completely free.

Modules

NPM is a node package management utility that is installed to make the process of using modules as easy as possible. By default, NPM installs the modules in a folder named node_modules in the directory where you want it. Here are the modules that will be used.

Request

The Request Module merges the methods-HTTP and HTTPS interfaces(use for downloading data from the internet), abstracts away the difficulties and presents a single unified interface for making requests and then web pages can be directly downloaded.

Cheerio

Cheerio allows working with downloaded web data using the same syntax that jQuery employs. This is a fast, flexible and lean implementation of jQuery and enables to focus on the data that we have downloaded.

Implementation

The code below is a quick little application to nab the temperature from a weather website.

var request = require(“request”),
cheerio = require(“cheerio”),
url = “http://www.wunderground.com/cgi-bin/findweather/getForecast?&query=” + 02888;

request(url, function (error, response, body) {
if (!error) {
var $ = cheerio.load(body),
temperature = $(“[data-variable=’temperature’] .wx-value”).html();

console.log(“It’s ” + temperature + ” degrees Fahrenheit.”);
} else {
console.log(“We’ve encountered an error: ” + error);
}
});

First, we acknowledged our modules so that we can access them later on then define the URL we want to download in a variable.

Then, use the request function to download specific page (URL). Once data received, three variables invoked: Error, Response, and Body. If data doesn’t download, an error object will pass to the function with a null value. However, if all well, just need to navigate to the URL used and need to notice the big green temperature element and need to hold the element to get the data.

Reference: http://www.smashingmagazine.com/2015/04/08/web-scraping-with-nodejs/

Latest posts by Rahul Huria (see all)

Leave a Comment