Make money with Oziconnect referral program

Amazon is the most popular e-commerce website for web scrapers, with billions of product pages scraped every month.

It also has a huge database of product reviews, which can be very useful for market research and competitor monitoring.

You can extract relevant data from Amazon websites and save it in spreadsheet or JSON format. You can also automate the process of regularly updating your data.

Scraping Amazon product reviews isn’t always easy, especially when a login is required. In this guide, you will learn how to scrape Amazon product reviews after logging in. Learn the process of logging in, parsing review data, and exporting reviews to CSV.

Let’s get started.

Prerequisites and project setup

Collect Amazon reviews using the Node.js Puppeteer library. Make sure Node.js is installed on your system. If not, visit the official Node.js website and install it.

After installing Node.js, install Puppeteer. Puppeteer is a Node.js library that provides a high-level, user-friendly API for automating tasks and interacting with dynamic web pages.

Next, let’s install and configure Puppeteer.

Open a terminal and create a new folder with any name. (In my case it is amazon_reviews).

mkdir amazon_reviews

Change the current directory to the folder created above.

cd amazon_reviews

You have successfully reached the correct directory. Initialize it by running the following command: package.json File:

npm init -y

Finally, install Puppeteer using the following command:

npm install puppeteer

The process looks like this:

Next, open the folder in your favorite code editor and create a new JavaScript file (index.js). Make sure your hierarchy looks like this:

Screenshot-2023-10-27-070823
Hierarchy display node_modules, index.js, package-lock.jsonand package.json

Everything was set up successfully. Now you’re ready to code your scraper.

Note: Make sure you have an account with Amazon so you can proceed with the rest of this tutorial.

Step 1: Visit your public page

Scrape reviews for the products listed below. Extract the author’s name, review title, and date.

The product URL is: https://www.amazon.com/ENHANCE-Headphone-Customizable-Lighting-Flexible/dp/B07DR59JLP/

Screenshot-2023-10-27-072923
Product used in the example – headphones

First, log in to Amazon and redirect to your product’s URL to collect reviews.

Step 2: Scrape behind the login

Amazon’s multi-step login process allows users to enter their username or email,[続行]You have to click the button, enter the password and finally submit the password. Typically, the username and password fields are both on separate pages.

Use the selector to enter your email ID input[name=email].

Screenshot-2023-10-27-082325
Sign-in field HTML

Then use the selector and click the Continue button. input[id=continue].

Screenshot-2023-10-27-083136
Continue button HTML

You should see the following page.Use selector to enter password input[name=password].

Screenshot-2023-10-27-083415
HTML for password field

Finally, use the selector and click the “Sign in” button. input[id=signInSubmit].

Screenshot-2023-10-27-083833
Sign in button HTML

Here is the code for the login process:

const selectors = 
  emailid: 'input[name=email]',
  password: 'input[name=password]',
  continue: 'input[id=continue]',
  singin: 'input[id=signInSubmit]',
;


    await page.goto(signinURL);
    await page.waitForSelector(selectors.emailid);
    await page.type(selectors.emailid, "satyam@gmail.com",  delay: 100 );
    await page.click(selectors.continue);
    await page.waitForSelector(selectors.password);
    await page.type(selectors.password, "mypassword",  delay: 100 );
    await page.click(selectors.singin);
    await page.waitForNavigation();

Follow the same steps as described above. First, go to the sign-in URL, enter your email ID, and click on the Continue button. Then enter your password and click on the “Sign in” button and wait for a while until the sign in process is completed.

Once the sign-in process is complete, you will be redirected to the product page for collecting reviews.

Screenshot-2023-10-27-072923-1
Product page

Step 3: Analyze your review data

You have successfully logged in and are now viewing the product page you want to scrape. Next, let’s analyze your review data.

This page contains various reviews.These reviews are contained within the parent div with ID cm-cr-dp-review-list, all reviews for the current page are retained. If you want to access more reviews, you will need to use the pagination process to navigate to the next page.

This parent div has multiple child divs, and each child div holds one review.To extract reviews you can use selector #cm-cr-dp-review-list div.review.

const selectors = 
  allReviews: '#cm-cr-dp-review-list div.review',
  authorName: 'div[data-hook="genome-widget"] span.a-profile-name',
  reviewTitle: '[data-hook=review-title]>span:not([class])',
  reviewDate: 'span[data-hook=review-date]',
;

This selector indicates to go to the element with ID first. cm-cr-dp-review-listsearch all div Element with data hook review.

annotly_image
Review data, such as author name, review title, and description.

The following code snippet first navigates to the product URL, waits for the selector to load, then retrieves all reviews, reviewElements variable.

await page.goto(productURL);
await page.waitForSelector(selectors.allReviews);
const reviewElements = await page.$$(selectors.allReviews);

Next, let’s extract the author name, review title, and date.

Screenshot-2023-10-27-091701
Target author name, review title, and date

To parse author names, you can use selectors. div[data-hook="genome-widget"] span.a-profile-name.This selector first div contains elements data-hook Attributes set to genome-widgetthe name is in this, so div element. next, span class name elements a-profile-name. This is the element that contains the author’s name.

const author = await reviewElement.$eval(selectors.authorName, (element) => element.textContent);

To parse the title of a review, you can use CSS selectors. [data-hook="review-title"] > span:not([class]). This selector is span elements that are direct children of [data-hook="review-title"] I have an element and it doesn’t have a class attribute.

const title = await reviewElement.$eval(selectors.reviewTitle, (element) => element.textContent);

To parse dates, you can use CSS selectors. span[data-hook="review-date"]. This selector is data-hook Attributes set to review-date. This is the element that contains the review date.

const rawReviewDate = await reviewElement.$eval(selectors.reviewDate, (element) => element.textContent);

Note that the entire text including the location is retrieved, not just the full date. So I need to extract the date from the text using a regular expression pattern.

Then combine all the data and reviewData and push it to the final list reviewsData.

const datePattern = /(w+sd1,2,sdcontact us)/;
      const match = rawReviewDate.match(datePattern);
      const reviewDate = match ? match[0].replace(',', '') : "Date not found";

      const reviewData = 
        author,
        title,
        reviewDate,
      ;

      reviewsData.push(reviewData);
    }

The above process will run until all reviews on the current page have been parsed. Here is the code snippet that parses the data:

for (const reviewElement of reviewElements) 
      const author = await reviewElement.$eval(selectors.authorName, (element) => element.textContent);
      const title = await reviewElement.$eval(selectors.reviewTitle, (element) => element.textContent);
      const rawReviewDate = await reviewElement.$eval(selectors.reviewDate, (element) => element.textContent);

      const datePattern = /(w+sd1,2,sdcontact us)/;
      const match = rawReviewDate.match(datePattern);
      const reviewDate = match ? match[0].replace(',', '') : "Date not found";

      const reviewData = 
        author,
        title,
        reviewDate,
      ;

      reviewsData.push(reviewData);
    

wonderful! The relevant data has been successfully parsed and is now in JSON format as shown below.

Screenshot-2023-10-27-095917
I scraped the data in JSON format

Step 4: Export reviews to CSV

Reviews are parsed in JSON format, making them somewhat human readable. Converting this data to his CSV format makes it easier to read and use for other purposes.

There are many ways to convert JSON data to CSV, but we’ll use a simple and effective approach. Here is a simple code snippet to convert JSON to CSV.

let csvContent = "Author,Title,Daten
for (const review of reviewsData) 
      const  author, title, reviewDate  = review;
      csvContent += `$author,"$title",$reviewDaten`;
    

const csvFileName = "amazon_reviews.csv";
await fs.writeFileSync(csvFileName, csvContent, "utf8");

The output of the CSV file is as follows:

Screenshot-2023-10-27-102705
Convert JSON data to CSV format

And it was done!

The complete code uploaded to GitHub can be found here.

conclusion

In this guide, you learned how to use Puppeteer to scrape Amazon product reviews after login. You learned how to log in, parse relevant data, and save it to a CSV file.

For further practice, you can use pagination to extract all reviews on all pages.

Make money with Oziconnect referral program
Make money with Oziconnect referral program
Make money with Oziconnect referral program
Make money with Oziconnect referral program
84512

About Us

We are a leading IT agency in Lagos, Nigeria, providing IT consulting and custom software development services. We offer a wide range of IT solutions across software development, web and mobile application development, blockchain development services, digital marketing, and branding.

Contact Us

25B Lagos-Abekouta Expressway Lagos

info@ozitechgroup.com

Phone: (234) 907 155 5545

@2023 OzitechGroup – All Right Reserved.