Sure, here's a sample script in Java using Selenium WebDriver that can scrape a webpage for name, email, designation, and phone number if available. This script assumes that the information is stored in HTML elements with specific IDs, classes, or other attributes that can be targeted using CSS selectors.
javaimport org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
public class WebScraper {
public static void main(String[] args) {
// Set path to the ChromeDriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver");
// Create a new instance of the ChromeDriver
WebDriver driver = new ChromeDriver();
// Navigate to the webpage to be scraped
driver.get("https://example.com");
// Find the elements containing the name, email, designation, and phone number
WebElement nameElement = driver.findElement(By.id("name"));
WebElement emailElement = driver.findElement(By.id("email"));
WebElement designationElement = driver.findElement(By.className("designation"));
WebElement phoneElement = driver.findElement(By.cssSelector("div.contact-info span.phone"));
// Extract the text from the elements
String name = nameElement.getText();
String email = emailElement.getText();
String designation = designationElement.getText();
String phone = phoneElement.getText();
// Print the extracted information to the console
System.out.println("Name: " + name);
System.out.println("Email: " + email);
System.out.println("Designation: " + designation);
System.out.println("Phone: " + phone);
// Close the browser
driver.quit();
}
}
This script uses the ChromeDriver executable to launch a new instance of the Chrome browser and navigate to the webpage to be scraped. It then uses various methods from the Selenium WebDriver API to find the HTML elements containing the desired information, extract the text from those elements, and print the results to the console. Finally, it closes the browser to clean up resources.
Of course, this script will need to be modified to suit the specific webpage and desired information to be scraped. It's also important to note that web scraping can be legally and ethically complex, so it's important to understand the laws and policies around web scraping before attempting to scrape any website.
Certainly! Here's a modified version of the previous script that reads input links from an Excel sheet and writes the scraped data to an output Excel sheet.
javaimport org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Iterator;
public class WebScraper {
public static void main(String[] args) {
// Set path to the ChromeDriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver");
// Create a new instance of the ChromeDriver
WebDriver driver = new ChromeDriver();
// Load the input Excel file
FileInputStream inputFile = new FileInputStream(new File("input.xlsx"));
Workbook inputWorkbook = new XSSFWorkbook(inputFile);
Sheet inputSheet = inputWorkbook.getSheetAt(0);
// Create the output Excel file
XSSFWorkbook outputWorkbook = new XSSFWorkbook();
XSSFSheet outputSheet = outputWorkbook.createSheet("Output");
// Loop over the rows in the input Excel sheet
Iterator<Row> rowIterator = inputSheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// Read the link from the first column
String link = row.getCell(0).getStringCellValue();
// Navigate to the webpage to be scraped
driver.get(link);
// Find the elements containing the name, email, designation, and phone number
WebElement nameElement = driver.findElement(By.id("name"));
WebElement emailElement = driver.findElement(By.id("email"));
WebElement designationElement = driver.findElement(By.className("designation"));
WebElement phoneElement = driver.findElement(By.cssSelector("div.contact-info span.phone"));
// Extract the text from the elements
String name = nameElement.getText();
String email = emailElement.getText();
String designation = designationElement.getText();
String phone = phoneElement.getText();
// Create a new row in the output Excel sheet
XSSFRow outputRow = outputSheet.createRow(row.getRowNum());
// Write the extracted information to the output Excel sheet
outputRow.createCell(0).setCellValue(name);
outputRow.createCell(1).setCellValue(email);
outputRow.createCell(2).setCellValue(designation);
outputRow.createCell(3).setCellValue(phone);
}
// Save the output Excel file
try {
FileOutputStream outputStream = new FileOutputStream("output.xlsx");
outputWorkbook.write(outputStream);
outputWorkbook.close();
} catch (Exception e) {
e.printStackTrace();
}
// Close the browser
driver.quit();
}
}
This script uses the Apache POI library to read input links from an Excel sheet and write the scraped data to an output Excel sheet. It loops over the rows in the input sheet, reads the link from the first column, scrapes the desired information from the webpage, and writes the results to the output sheet. Finally, it saves the output Excel file and closes the browser.