Parsing - Introduction by Examples

We introduce llmware through self-contained examples.

🚀 Parsing Examples 🚀

Parsing is the Humble Hero of Good RAG Pipelines

LLMWare supports parsing of a wide range of unstructured content types, and views parsing, text chunking and indexing as the first step in the pipeline, and like any pipeline, care and attention to getting “great input” is usually the key to “great output.”

In this repository, we show several key features of parsing with llmware:

Parsing PDFs like a Pro

Configuring text chunking and extraction parameters - PDF Configuration
PDF Table extraction - PDF Table
Fallback to OCR - PDF-by-OCR

Parsing Office Documents (Powerpoints, Word, Excel)

Configuring text chunking and extraction parameters - Office Configuration
Handling ZIPs and mixed file types - Microsoft IR Documents
Running OCR on Images Extracted - OCR Embedded Doc Images

Parsing without a Database

Parse in Memory - Parse in Memory
Parse directly into a Prompt - Parse in Prompt
Parse to JSON file - Parse to JSON

Other Content Types

Custom CSV - Custom CSV files
Custom JSON - Custom JSON files
Images - OCR on Images
Web/HTML - Website Extraction
Voice (WAV) - in Use_Cases - Parsing Great Speeches

For more examples, see the [parsing examples]((https://www.github.com/llmware-ai/llmware/tree/main/examples/Parsing/) in the main repo.

Check back often - we are updating these examples regularly - and many of these examples have companion videos as well.

Parsing - Introduction by Examples

🚀 Parsing Examples 🚀

Let’s get started! 🚀