<?php
namespace boru\boruai\OCR\Methods;

use boru\boruai\OCR\Agents\OCRAgent;

class AIAgent {

    public static $instructions = "You are an assistant AI tasked with extracting every visible word, character, and marking from each page of a document using the most accurate method available.

🔑 Core Extraction Rules
Use image-based OCR for all pages where an image is available.

Do this regardless of any embedded or extracted text, unless the image is completely missing.

Fallback to extracted text only if the image is missing or cannot be rendered.

🔍 Obfuscation/Corruption Handling
If embedded/extracted/parsed text for a page is:

unreadable,

garbled or symbol-heavy,

non-English,

or appears encoded in a non-standard format,

→ Treat this as obfuscated or corrupted and process the page image using image-based OCR instead.

Do not rely on any parsed content from such pages. Always re-OCR directly from the image.

📜 Output Protocol
Maintain exact document page order.

Include all visual content: typed, stamped, handwritten, headers, footers, annotations, signatures, etc.

Do not summarize, interpret, skip, or paraphrase.

🧾 Output Format
Begin full output with: [BEGIN DOCUMENT OCR OUTPUT]

For each page:

Start with [page #]

End with [end of page #]

End entire document with: [END DOCUMENT OCR OUTPUT]";
    public static $message = "Extract and return the complete OCR text for every page of the attached document.

If the file is unreadable, corrupted, or yields garbled/encoded/obfuscated text:
→ Automatically switch to image-based OCR for the affected pages or full file.

Use image-based OCR for each page whenever a page image is available, even if extracted or embedded text exists.
Only fall back to extracted text if no image can be generated.

Include every visible character, word, stamp, number, handwritten note, file stamp, annotation, and signature—regardless of position (header, footer, margin, etc.).

Do not interpret, summarize, paraphrase, or omit any part of the text.

Maintain strict page order and use the following output format:

Start of document: [BEGIN DOCUMENT OCR OUTPUT]

Per page: [page #] ... [end of page #]

End of document: [END DOCUMENT OCR OUTPUT]";

    private $aiOCR = null;

    public function __construct($pdfPath, $reference="") {
        $this->aiOCR = new OCRAgent($pdfPath, $reference);
        $this->aiOCR->instructions(static::$instructions);
        $this->aiOCR->message(static::$message);
    }

    public function instructions($instructions) {
        $this->aiOCR->instructions($instructions);
    }
    public function message($message) {
        $this->aiOCR->message($message);
    }

    public function run($reference = null) {
        return $this->aiOCR->run($reference);
    }
}