improve-diarization-with-llm

This tool can take a long script (greater than 10 hours) of diarized content and improve the diarization by prompting an LLM model to look for obviously incorrect attribution and fix it. Credit to this paper for the idea: https://arxiv.org/html/2401.03506v4

Install

pip install improve_diarization_with_llm

How to use

import os
from improve_diarization_with_llm import claude_corrector
    
os.environ['ANTHROPIC_API_KEY'] = 'your-api-key'  # Replace with your actual API key
input_file = 'path/to/your/input/transcript.txt'  # Replace with your actual input file path
output_file = 'path/to/your/output/improved_transcript.txt'  # Replace with your desired output file path
    
corrector = claude_corrector.ClaudeDiarizationCorrector(input_file, output_file)

# corrector.process_conversation() this assumes a valid ANTHROPIC_API_KEY environment variable and input path