Prompt Title: Selenium web scraping in Java.

Created 1 year ago
Votes 0
Views 219 times
IT
0
Human: Take a Developer's persona and write a webscrapping script which traverses a webpage and get name, email, designation and phone number if available using selenium java
ChatGPT:

Sure, here's a sample script in Java using Selenium WebDriver that can scrape a webpage for name, email, designation, and phone number if available. This script assumes that the information is stored in HTML elements with specific IDs, classes, or other attributes that can be targeted using CSS selectors.

java
import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; public class WebScraper { public static void main(String[] args) { // Set path to the ChromeDriver executable System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver"); // Create a new instance of the ChromeDriver WebDriver driver = new ChromeDriver(); // Navigate to the webpage to be scraped driver.get("https://example.com"); // Find the elements containing the name, email, designation, and phone number WebElement nameElement = driver.findElement(By.id("name")); WebElement emailElement = driver.findElement(By.id("email")); WebElement designationElement = driver.findElement(By.className("designation")); WebElement phoneElement = driver.findElement(By.cssSelector("div.contact-info span.phone")); // Extract the text from the elements String name = nameElement.getText(); String email = emailElement.getText(); String designation = designationElement.getText(); String phone = phoneElement.getText(); // Print the extracted information to the console System.out.println("Name: " + name); System.out.println("Email: " + email); System.out.println("Designation: " + designation); System.out.println("Phone: " + phone); // Close the browser driver.quit(); } }

This script uses the ChromeDriver executable to launch a new instance of the Chrome browser and navigate to the webpage to be scraped. It then uses various methods from the Selenium WebDriver API to find the HTML elements containing the desired information, extract the text from those elements, and print the results to the console. Finally, it closes the browser to clean up resources.

Of course, this script will need to be modified to suit the specific webpage and desired information to be scraped. It's also important to note that web scraping can be legally and ethically complex, so it's important to understand the laws and policies around web scraping before attempting to scrape any website.

Human: modify it to take the link from an excel sheet and write the output to a excelsheet
ChatGPT:

Certainly! Here's a modified version of the previous script that reads input links from an Excel sheet and writes the scraped data to an output Excel sheet.

java
import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.*; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.util.Iterator; public class WebScraper { public static void main(String[] args) { // Set path to the ChromeDriver executable System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver"); // Create a new instance of the ChromeDriver WebDriver driver = new ChromeDriver(); // Load the input Excel file FileInputStream inputFile = new FileInputStream(new File("input.xlsx")); Workbook inputWorkbook = new XSSFWorkbook(inputFile); Sheet inputSheet = inputWorkbook.getSheetAt(0); // Create the output Excel file XSSFWorkbook outputWorkbook = new XSSFWorkbook(); XSSFSheet outputSheet = outputWorkbook.createSheet("Output"); // Loop over the rows in the input Excel sheet Iterator<Row> rowIterator = inputSheet.iterator(); while (rowIterator.hasNext()) { Row row = rowIterator.next(); // Read the link from the first column String link = row.getCell(0).getStringCellValue(); // Navigate to the webpage to be scraped driver.get(link); // Find the elements containing the name, email, designation, and phone number WebElement nameElement = driver.findElement(By.id("name")); WebElement emailElement = driver.findElement(By.id("email")); WebElement designationElement = driver.findElement(By.className("designation")); WebElement phoneElement = driver.findElement(By.cssSelector("div.contact-info span.phone")); // Extract the text from the elements String name = nameElement.getText(); String email = emailElement.getText(); String designation = designationElement.getText(); String phone = phoneElement.getText(); // Create a new row in the output Excel sheet XSSFRow outputRow = outputSheet.createRow(row.getRowNum()); // Write the extracted information to the output Excel sheet outputRow.createCell(0).setCellValue(name); outputRow.createCell(1).setCellValue(email); outputRow.createCell(2).setCellValue(designation); outputRow.createCell(3).setCellValue(phone); } // Save the output Excel file try { FileOutputStream outputStream = new FileOutputStream("output.xlsx"); outputWorkbook.write(outputStream); outputWorkbook.close(); } catch (Exception e) { e.printStackTrace(); } // Close the browser driver.quit(); } }

This script uses the Apache POI library to read input links from an Excel sheet and write the scraped data to an output Excel sheet. It loops over the rows in the input sheet, reads the link from the first column, scrapes the desired information from the webpage, and writes the results to the output sheet. Finally, it saves the output Excel file and closes the browser.

avatar
Abhishek Vk
Shared 2 prompts
Created 1 year ago

Leave a Comment

Related Tag Prompts