Prompt Title: Compress any text in folder using GPT-4

Created 1 year ago
Votes 0
Views 383 times
IT
0

Prompt Detail:

To modify the script to compress the text extracted from various file types in a given folder, you can use the following approach:

  1. Use libraries such as docx2txt, PyPDF2, and other parsers to read the contents of different file types.
  2. Create a function that takes a file path as input, reads the file contents, and returns the extracted text.
  3. Traverse the folder structure and process each file based on its extension, calling the appropriate reader function.
  4. For each file, call the compress_text_gpt4 function and store the compressed content.

Here's a modified version of the script to handle various file types in a given folder:

python
import openai import os import docx2txt import PyPDF2 # Add other parsers for different file types as needed def read_txt_file(file_path): with open(file_path, 'r', encoding='utf-8') as f: return f.read() def read_docx_file(file_path): return docx2txt.process(file_path) def read_pdf_file(file_path): with open(file_path, 'rb') as f: reader = PyPDF2.PdfFileReader(f) text = '' for page_num in range(reader.numPages): text += reader.getPage(page_num).extractText() return text # Add other reader functions for different file types as needed def process_files(directory): for root, dirs, files in os.walk(directory): for file in files: file_path = os.path.join(root, file) # Determine file type and read contents _, file_extension = os.path.splitext(file) file_extension = file_extension.lower() if file_extension == ".txt": content = read_txt_file(file_path) elif file_extension == ".docx": content = read_docx_file(file_path) elif file_extension == ".pdf": content = read_pdf_file(file_path) # Add more file types as needed else: continue # Skip unsupported file types compressed_content = compress_text_gpt4(text=content) print(f"Original Text ({file}):") print(content) print(f"\nCompressed Text ({file}):") print(compressed_content) print("\n" + "-" * 80 + "\n") # Example usage if __name__ == "__main__": target_directory = "path/to/your/folder" process_files(target_directory)

Remember to install the required libraries using pip:

bash
pip install docx2txt PyPDF2

This script will process all supported file types in the specified folder and print the original and compressed content. You can modify the script to store the compressed content in files or perform other tasks as needed. Note that processing large folders with many files might consume a significant amount of API requests and take a long time to complete.

avatar
Francy
Shared 1 prompt
Created 1 year ago

Leave a Comment