Supported Formats
Lists
Process data lists
Plain Text
Handle text files
Extract from PDFs
JSON
Parse JSON data
CSV
Process spreadsheets
XLSX
Excel file support
Installation
Install PII Scanner using pip:
pip install pii-scanner
Quick Start
import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions
async def run_scan():
pii_scanner = PIIScanner()
results = await pii_scanner.scan(
file_path='test.pdf',
sample_size=0.005,
region=Regions.IN
)
print(results)
asyncio.run(run_scan())
Key Features
Asynchronous Processing
Handle multiple texts in parallel for maximum performance
Region-Specific Matching
Apply localized regex patterns for precise detection
Multiple Formats
Process various file types seamlessly in one solution
Pre-installed NLTK
Ready to use with all required datasets included