PII Scanner

Advanced Text Processing with SpaCy and Pattern Matching

Supported Formats

📋

Lists

Process data lists

📝

Plain Text

Handle text files

📄

PDF

Extract from PDFs

🔄

JSON

Parse JSON data

📊

CSV

Process spreadsheets

📑

XLSX

Excel file support

Installation

Install PII Scanner using pip:

pip install pii-scanner

Quick Start

import asyncio from pii_scanner.scanner import PIIScanner from pii_scanner.constants.patterns_countries import Regions async def run_scan(): pii_scanner = PIIScanner() results = await pii_scanner.scan( file_path='test.pdf', sample_size=0.005, region=Regions.IN ) print(results) asyncio.run(run_scan())

Key Features

Asynchronous Processing

Handle multiple texts in parallel for maximum performance

🌍

Region-Specific Matching

Apply localized regex patterns for precise detection

🔄

Multiple Formats

Process various file types seamlessly in one solution

🤖

Pre-installed NLTK

Ready to use with all required datasets included