diff --git a/README.md b/README.md new file mode 100644 index 0000000..affecb7 --- /dev/null +++ b/README.md @@ -0,0 +1,195 @@ +# Belegimport + +Automatischer Import von Belegen (Rechnungen, Gutschriften) aus verschiedenen Quellen und Weiterleitung per SMTP an Buchhaltungssoftware (z.B. Buchhaltungsbutler). + +## Features + +- **Scan-Upload**: PDF hochladen, automatische Trennung per QR-Code-Trennseiten +- **IMAP**: Automatischer Abruf von Belegen aus Email-Postfachern +- **SMB/Netzlaufwerk**: Automatischer Abruf von Belegen aus Netzwerkordnern +- **Amazon Business**: Automatischer Abruf von Amazon-Rechnungen per API +- **Eingangs-/Ausgangsbelege**: Getrennte Import-Adressen fur Einkauf und Verkauf +- **Scheduler**: Automatischer Abruf in konfigurierbaren Intervallen +- **Verarbeitungslog**: Ubersicht aller importierten Belege mit Status + +## Technologie + +- Python 3.12, FastAPI, Jinja2, SQLite (aiosqlite) +- Docker / docker-compose +- Playwright (optional, fur Amazon Browser-Automation als Fallback) + +## Installation + +### Docker (empfohlen) + +```bash +git clone +cd lex-office-belegimport-mail +sudo docker-compose up --build -d +``` + +Die Webanwendung ist erreichbar unter: `http://localhost:8081` + +### Konfiguration + +Alle Einstellungen werden uber die Weboberflache vorgenommen: + +1. **Einstellungen** (`/settings`): SMTP-Server, IMAP, SMB, Import-Emailadressen +2. **Plattformen** (`/platforms`): Amazon Business API-Zugangsdaten +3. **Scan-Upload** (`/`): Manueller PDF-Upload mit Belegart-Auswahl + +--- + +## Amazon Business API Einrichtung + +Die Amazon-Integration nutzt die offizielle Amazon Business API (Reconciliation + Document API) um Rechnungen automatisch abzurufen. Kein Browser-Login, keine CAPTCHAs, vollautomatisch. + +### Voraussetzungen + +- Amazon Business Konto (mit Business Prime) +- Zugang zum [Amazon Solution Provider Portal](https://solutionproviderportal.amazon.com/) + +### Schritt 1: Als Entwickler registrieren + +1. Offne das [Solution Provider Portal](https://solutionproviderportal.amazon.com/) +2. Wahle **"Private seller applications"** (fur eigene Integrationen) +3. Wahle **"Erstellen Sie Anwendungen, die SP-APIs verwenden"** +4. Fulle die Unternehmensdaten aus (Name, Handelsregisternummer, Adresse) +5. Verifiziere per SMS + +### Schritt 2: Rollen auswahlen + +Wahle folgende Rollen: + +- **Abgleichen von Business-Einkaufen** (Reconciliation API) +- **Amazon Business-Bestellung** (Business Orders API) + +### Schritt 3: Sicherheitskontrollen + +Beantworte alle Sicherheitsfragen mit **"Ja"**. + +Bei den Textfeldern: +- **Externe Parteien**: `Keine. Die Daten werden ausschliesslich intern fur die eigene Buchhaltung verwendet.` +- **Externe Quellen**: `Keine.` + +### Schritt 4: App registrieren + +Nach der Freischaltung (kann einige Tage dauern): + +1. Im Developer Central: **"+ Neuen App-Client hinzufugen"** +2. Einstellungen: + - **App-Name**: `Beleg import` (oder eigener Name) + - **API-Typ**: `SP-API` + - **App-Typ**: `Produktion` + - **Amazon Business**: angehakt + - **Verkaufer**: nicht angehakt + - **Rollen**: Abgleichen von Business-Einkaufen + Amazon Business-Bestellung + - **RDT**: Nein + - **OAuth-Anmeldungs-URI**: `https://ihre-domain.de/api/amazon-oauth-callback` + - **OAuth-Umleitungs-URI**: `https://ihre-domain.de/api/amazon-oauth-callback` + +3. Nach dem Speichern: **"Anmeldedaten fur Login mit Amazon" -> "Anzeigen"** + - Notiere **Client-ID** (`amzn1.application-oa2-client.xxxxx`) + - Notiere **Client-Sicherheitsschluessel** (`amzn1.oa2-cs.v1.xxxxx`) + +4. Die **App-ID** (`amzn1.sp.solution.xxxxx`) steht in der App-Ubersicht unter dem App-Namen + +> **Hinweis**: Die OAuth-Umleitungs-URI muss eine echte Domain mit Top-Level-Domain sein. +> `localhost` und `.local` Domains werden von Amazon nicht akzeptiert. +> Die URI muss nicht offentlich erreichbar sein - Amazon leitet nur den Browser des Benutzers dorthin weiter. + +> **Fehler SPSA0404**: Falls beim Autorisieren der Fehler "Keine unterstuetzte Geschaeftseinheit" erscheint, +> muss die Autorisierung uber den OAuth-Flow (Website) statt uber Self-Authorization erfolgen. +> Der Belegimport unterstutzt dies automatisch. + +### Schritt 5: Im Belegimport konfigurieren + +1. Offne die Plattformen-Seite im Belegimport +2. Setze **Abruf-Modus** auf **"API (empfohlen)"** +3. Trage ein: + - **App-ID**: `amzn1.sp.solution.xxxxx` + - **Client-ID**: `amzn1.application-oa2-client.xxxxx` + - **Client-Sicherheitsschluessel**: Der Secret-Wert +4. **Einstellungen speichern** +5. Klicke **"Bei Amazon autorisieren"** +6. Melde dich bei Amazon an und erlaube den Zugriff +7. Kopiere den `spapi_oauth_code` (oder die ganze URL) aus der Browser-Adressleiste +8. Trage den Code im Belegimport ein und klicke **"Token tauschen"** +9. Status sollte auf **"API autorisiert"** wechseln + +### Schritt 6: Rechnungen abrufen + +- **Manuell**: Klicke "Jetzt Rechnungen abrufen" +- **Automatisch**: Aktiviere den Scheduler unter Einstellungen (z.B. alle 60 Minuten) + +Die Rechnungen werden als PDF per SMTP an die konfigurierte Eingangsbeleg-Adresse gesendet. +Bereits abgerufene Rechnungen werden automatisch ubersprungen. + +### OAuth Redirect URI (lokale Installation) + +Da Amazon keine `localhost`-URIs akzeptiert, gibt es zwei Optionen: + +**Option A: Eigene Domain verwenden (empfohlen)** + +Trage eine echte Domain ein (z.B. `https://ihre-domain.de/api/amazon-oauth-callback`). +Nach der Amazon-Autorisierung leitet der Browser dorthin weiter - die Seite ladt nicht, +aber der Auth-Code steht in der URL-Leiste. Diesen Code im Belegimport eintragen. + +**Option B: /etc/hosts Eintrag** + +Falls der Server lokal erreichbar sein soll: + +```bash +# In /etc/hosts eintragen: +127.0.0.1 app.belegimport.de +``` + +Dann in der Amazon App als Redirect URI eintragen: +`https://app.belegimport.de/api/amazon-oauth-callback` + +> **Achtung**: Amazon pruft ob die Domain eine Top-Level-Domain hat. +> `.local` funktioniert nicht, aber `.de` schon. + +### Umgebungsvariablen + +In `docker-compose.yml`: + +```yaml +environment: + - OAUTH_REDIRECT_BASE=https://ihre-domain.de # Muss zur Amazon App passen +``` + +--- + +## Eingangs- und Ausgangsbelege + +Der Belegimport unterscheidet zwischen: + +- **Eingangsbelege (Einkauf)**: Rechnungen die Sie von Lieferanten erhalten +- **Ausgangsbelege (Verkauf/Gutschrift)**: Rechnungen die Sie an Kunden senden + +Fur beide Typen konnen separate Import-Emailadressen konfiguriert werden (z.B. fur Buchhaltungsbutler). +Amazon-Rechnungen werden automatisch als Eingangsbelege klassifiziert. + +Bei IMAP und SMB konnen jeweils getrennte Quell- und Verarbeitungsordner fur Eingangs- und Ausgangsbelege konfiguriert werden. + +Beim Scan-Upload kann die Belegart per Radio-Button ausgewahlt werden. + +--- + +## Verarbeitungslog + +Unter `/log` werden alle verarbeiteten Belege angezeigt mit: + +- Zeitpunkt, Betreff, Absender +- Belegart (Eingang/Ausgang) +- Anzahl Anhange +- Status (OK/Fehler) +- Fehlermeldung (falls vorhanden) +- SMTP-Protokoll (anzeigbar) + +--- + +## Lizenz + +Privates Projekt. diff --git a/app/amazon_api.py b/app/amazon_api.py new file mode 100644 index 0000000..44e4256 --- /dev/null +++ b/app/amazon_api.py @@ -0,0 +1,515 @@ +"""Amazon Business API client using SP-API (Reconciliation + Document API). + +This module provides API-based invoice retrieval as an alternative to browser automation. +Uses OAuth2 with LWA (Login with Amazon) for authentication. + +Document API workflow (EU): +1. POST /reports/.../reports → reportId +2. GET /reports/.../reports/{reportId} → poll until DONE → reportDocumentId +3. GET /reports/.../documents/{reportDocumentId} → presigned URL +4. Download + decompress (gzip then zip) → PDF +""" + +import asyncio +import gzip +import io +import logging +import urllib.parse +import zipfile +from datetime import datetime, timedelta +from pathlib import Path + +import httpx + +from app.database import get_settings, save_settings, add_log_entry, is_invoice_downloaded, mark_invoice_downloaded +from app.mail_processor import _connect_smtp, _build_forward_email, _send_with_log + +logger = logging.getLogger(__name__) + +# Amazon LWA (Login with Amazon) endpoints +LWA_TOKEN_URL = "https://api.amazon.com/auth/o2/token" + +# Amazon Business OAuth consent URLs per domain (NOT sellercentral!) +AB_OAUTH_URLS = { + "amazon.de": "https://www.amazon.de/b2b/abws/oauth", + "amazon.at": "https://www.amazon.de/b2b/abws/oauth", # AT uses DE + "amazon.fr": "https://www.amazon.fr/b2b/abws/oauth", + "amazon.it": "https://www.amazon.it/b2b/abws/oauth", + "amazon.es": "https://www.amazon.es/b2b/abws/oauth", + "amazon.co.uk": "https://www.amazon.co.uk/b2b/abws/oauth", + "amazon.com": "https://www.amazon.com/b2b/abws/oauth", +} + +# Amazon Business API endpoints per region +AB_API_ENDPOINTS = { + "eu": "https://eu.business-api.amazon.com", + "na": "https://na.business-api.amazon.com", +} + +# API versions +RECONCILIATION_VERSION = "2021-01-08" +REPORTS_VERSION = "2021-09-30" + +# Domain to region mapping +DOMAIN_REGION = { + "amazon.de": "eu", + "amazon.at": "eu", + "amazon.fr": "eu", + "amazon.it": "eu", + "amazon.es": "eu", + "amazon.co.uk": "eu", + "amazon.com": "na", +} + +# Domain to marketplace ID +DOMAIN_MARKETPLACE = { + "amazon.de": "A1PA6795UKMFR9", + "amazon.at": "A2NODRKZP88ZB9", + "amazon.fr": "A13V1IB3VIYZZH", + "amazon.it": "APJ6JRA9NG5V4", + "amazon.es": "A1RKKUPIHCS9HS", + "amazon.co.uk": "A1F83G8C2ARO7P", + "amazon.com": "ATVPDKIKX0DER", +} + + +def get_oauth_authorize_url(application_id: str, redirect_uri: str, domain: str = "amazon.de", state: str = "") -> str: + """Generate the OAuth authorization URL for Amazon Business API consent.""" + base_url = AB_OAUTH_URLS.get(domain, AB_OAUTH_URLS["amazon.de"]) + params = { + "applicationId": application_id, + "state": state or "auth", + "redirect_uri": redirect_uri, + } + return f"{base_url}?{urllib.parse.urlencode(params)}" + + +async def exchange_auth_code(code: str, client_id: str, client_secret: str, redirect_uri: str) -> dict: + """Exchange authorization code for refresh token via LWA.""" + async with httpx.AsyncClient() as client: + resp = await client.post(LWA_TOKEN_URL, data={ + "grant_type": "authorization_code", + "code": code, + "client_id": client_id, + "client_secret": client_secret, + "redirect_uri": redirect_uri, + }) + if resp.status_code != 200: + logger.error(f"LWA Token-Exchange fehlgeschlagen: {resp.status_code} {resp.text}") + return {"error": f"Token-Exchange fehlgeschlagen: {resp.status_code} - {resp.text}"} + data = resp.json() + logger.info("LWA Token-Exchange erfolgreich") + return data + + +async def get_access_token(client_id: str, client_secret: str, refresh_token: str) -> str | None: + """Get a fresh access token using the refresh token.""" + async with httpx.AsyncClient() as client: + resp = await client.post(LWA_TOKEN_URL, data={ + "grant_type": "refresh_token", + "refresh_token": refresh_token, + "client_id": client_id, + "client_secret": client_secret, + }) + if resp.status_code != 200: + logger.error(f"Access-Token-Refresh fehlgeschlagen: {resp.status_code} {resp.text}") + return None + data = resp.json() + return data.get("access_token") + + +async def check_api_configured() -> dict: + """Check if API credentials are configured and valid.""" + settings = await get_settings() + client_id = settings.get("amazon_client_id", "") + client_secret = settings.get("amazon_client_secret", "") + refresh_token = settings.get("amazon_refresh_token", "") + + if not client_id or not client_secret: + return {"configured": False, "authorized": False, "error": "Client-ID oder Client-Secret fehlt"} + + if not refresh_token: + return {"configured": True, "authorized": False, "error": "Noch nicht autorisiert (Refresh-Token fehlt)"} + + # Try to get an access token to verify credentials + access_token = await get_access_token(client_id, client_secret, refresh_token) + if not access_token: + return {"configured": True, "authorized": False, "error": "Autorisierung abgelaufen - bitte erneut autorisieren"} + + return {"configured": True, "authorized": True} + + +async def _get_api_client(settings: dict) -> tuple[httpx.AsyncClient, str] | None: + """Create an authenticated API client. Returns (client, region) or None.""" + client_id = settings.get("amazon_client_id", "") + client_secret = settings.get("amazon_client_secret", "") + refresh_token = settings.get("amazon_refresh_token", "") + + if not all([client_id, client_secret, refresh_token]): + return None + + access_token = await get_access_token(client_id, client_secret, refresh_token) + if not access_token: + return None + + domain = settings.get("amazon_domain", "amazon.de") + region = DOMAIN_REGION.get(domain, "eu") + + client = httpx.AsyncClient( + base_url=AB_API_ENDPOINTS.get(region, AB_API_ENDPOINTS["eu"]), + headers={ + "x-amz-access-token": access_token, + "Content-Type": "application/json", + "user-agent": "Belegimport/1.0 (Language=Python/3.12)", + }, + timeout=30.0, + ) + return client, region + + +async def get_transactions(settings: dict, since_date: datetime) -> list[dict]: + """Get transactions via Reconciliation API.""" + result = await _get_api_client(settings) + if not result: + return [] + + client, region = result + + transactions = [] + try: + # feedEndDate must not exceed current UTC time + now_utc = datetime.utcnow() + params = { + "feedStartDate": since_date.strftime("%Y-%m-%dT00:00:00Z"), + "feedEndDate": now_utc.strftime("%Y-%m-%dT%H:%M:%SZ"), + } + + next_token = None + page = 0 + while True: + page += 1 + if next_token: + params["nextPageToken"] = next_token + + logger.info(f"Amazon API: Reconciliation-Abfrage Seite {page}...") + resp = await client.get( + f"/reconciliation/{RECONCILIATION_VERSION}/transactions", + params=params, + ) + + if resp.status_code != 200: + logger.error(f"Amazon API: Reconciliation fehlgeschlagen: {resp.status_code} {resp.text}") + break + + data = resp.json() + page_transactions = data.get("transactions", []) + transactions.extend(page_transactions) + logger.info(f"Amazon API: Seite {page}: {len(page_transactions)} Transaktionen") + + next_token = data.get("nextPageToken") + if not next_token: + break + + except Exception as e: + logger.error(f"Amazon API: Reconciliation-Fehler: {e}") + finally: + await client.aclose() + + logger.info(f"Amazon API: {len(transactions)} Transaktionen gesamt") + return transactions + + +async def _create_invoice_report(client: httpx.AsyncClient, order_id: str, marketplace_id: str) -> str | None: + """Step 1: Create a report request for invoice PDF.""" + body = { + "reportType": "GET_AB_INVOICE_PDF", + "marketplaceIds": [marketplace_id], + "reportOptions": { + "orderId": order_id, + "documentType": "Invoice", + }, + } + try: + resp = await client.post(f"/reports/{REPORTS_VERSION}/reports", json=body) + if resp.status_code in (200, 202): + data = resp.json() + report_id = data.get("reportId") + logger.info(f"Amazon API: Report erstellt für {order_id}: {report_id}") + return report_id + else: + logger.warning(f"Amazon API: Report-Erstellung fehlgeschlagen für {order_id}: {resp.status_code} {resp.text}") + return None + except Exception as e: + logger.error(f"Amazon API: Report-Erstellung Fehler: {e}") + return None + + +async def _poll_report_status(client: httpx.AsyncClient, report_id: str, max_wait: int = 120) -> str | None: + """Step 2: Poll report status until DONE. Returns reportDocumentId.""" + for i in range(max_wait // 15 + 1): + try: + resp = await client.get(f"/reports/{REPORTS_VERSION}/reports/{report_id}") + if resp.status_code != 200: + logger.warning(f"Amazon API: Report-Status fehlgeschlagen: {resp.status_code}") + return None + + data = resp.json() + status = data.get("processingStatus", "") + + if status == "DONE": + doc_id = data.get("reportDocumentId") + logger.info(f"Amazon API: Report {report_id} fertig: documentId={doc_id}") + return doc_id + elif status in ("CANCELLED", "FATAL"): + logger.warning(f"Amazon API: Report {report_id} fehlgeschlagen: {status}") + return None + else: + logger.debug(f"Amazon API: Report {report_id} Status: {status}, warte...") + await asyncio.sleep(15) + except Exception as e: + logger.error(f"Amazon API: Report-Status Fehler: {e}") + return None + + logger.warning(f"Amazon API: Report {report_id} Timeout nach {max_wait}s") + return None + + +async def _download_report_document(client: httpx.AsyncClient, document_id: str) -> bytes | None: + """Step 3: Get presigned URL and download + decompress the PDF.""" + try: + resp = await client.get(f"/reports/{REPORTS_VERSION}/documents/{document_id}") + if resp.status_code != 200: + logger.warning(f"Amazon API: Document-URL fehlgeschlagen: {resp.status_code}") + return None + + data = resp.json() + url = data.get("url", "") + compression = data.get("compressionAlgorithm", "") + + if not url: + logger.warning(f"Amazon API: Keine Download-URL für Document {document_id}") + return None + + # Download the document (presigned S3 URL, expires in 5 min) + async with httpx.AsyncClient(timeout=60.0) as dl_client: + dl_resp = await dl_client.get(url) + if dl_resp.status_code != 200: + logger.warning(f"Amazon API: Document-Download fehlgeschlagen: {dl_resp.status_code}") + return None + + content = dl_resp.content + + # Decompress: EU documents are gzip-compressed, then the content is a zip file + if compression == "GZIP" or content[:2] == b'\x1f\x8b': + try: + content = gzip.decompress(content) + except Exception: + pass # might not be gzipped + + # Check if it's a zip file containing the PDF + if content[:2] == b'PK': + try: + with zipfile.ZipFile(io.BytesIO(content)) as zf: + for name in zf.namelist(): + if name.lower().endswith('.pdf'): + content = zf.read(name) + break + except Exception: + pass # might not be a zip + + # Verify it's a PDF + if content[:4] == b'%PDF': + logger.info(f"Amazon API: PDF heruntergeladen: {len(content)} Bytes") + return content + else: + logger.warning(f"Amazon API: Heruntergeladenes Dokument ist kein PDF (starts: {content[:20]})") + return None + + except Exception as e: + logger.error(f"Amazon API: Document-Download Fehler: {e}") + return None + + +async def download_invoice(settings: dict, order_id: str) -> bytes | None: + """Download invoice PDF via Document API (3-step async process).""" + result = await _get_api_client(settings) + if not result: + return None + + client, region = result + domain = settings.get("amazon_domain", "amazon.de") + marketplace_id = DOMAIN_MARKETPLACE.get(domain, DOMAIN_MARKETPLACE["amazon.de"]) + + try: + # Step 1: Create report + report_id = await _create_invoice_report(client, order_id, marketplace_id) + if not report_id: + return None + + # Step 2: Poll until done + document_id = await _poll_report_status(client, report_id) + if not document_id: + return None + + # Step 3: Download document + return await _download_report_document(client, document_id) + + except Exception as e: + logger.error(f"Amazon API: Invoice-Download-Fehler für {order_id}: {e}") + return None + finally: + await client.aclose() + + +async def process_amazon_api() -> dict: + """Process Amazon invoices via API (Reconciliation + Document API).""" + settings = await get_settings() + + if settings.get("amazon_enabled") != "true": + return {"processed": 0, "skipped": 0, "errors": 0} + + # Check API credentials + status = await check_api_configured() + if not status.get("authorized"): + error_msg = status.get("error", "API nicht konfiguriert") + logger.warning(f"Amazon API: {error_msg}") + return {"processed": 0, "skipped": 0, "errors": 0, "error": error_msg} + + domain = settings.get("amazon_domain", "amazon.de") + + # Determine date range + since_str = settings.get("amazon_since_date", "") + if since_str: + try: + since_date = datetime.strptime(since_str, "%Y-%m-%d") + except ValueError: + since_date = datetime.now() - timedelta(days=30) + else: + since_date = datetime.now() - timedelta(days=30) + + logger.info(f"Amazon API: Import gestartet: domain={domain}, seit={since_date.strftime('%Y-%m-%d')}") + + # Connect SMTP + import_email = settings.get("import_email_eingang") or settings.get("import_email", "") + if not import_email: + error_msg = "Keine Import-Email für Eingangsbelege konfiguriert" + logger.error(f"Amazon API: {error_msg}") + await add_log_entry("Amazon-Import", f"Amazon ({domain})", 0, "error", error_msg, beleg_type="eingang") + return {"processed": 0, "skipped": 0, "errors": 1, "error": error_msg} + + smtp = _connect_smtp(settings) + if not smtp: + error_msg = "SMTP-Verbindung fehlgeschlagen" + logger.error(f"Amazon API: {error_msg}") + await add_log_entry("Amazon-Import", f"Amazon ({domain})", 0, "error", error_msg, beleg_type="eingang") + return {"processed": 0, "skipped": 0, "errors": 1, "error": error_msg} + + processed = 0 + skipped = 0 + errors = 0 + + try: + # Get transactions via Reconciliation API + transactions = await get_transactions(settings, since_date) + + if not transactions: + logger.info("Amazon API: Keine Transaktionen gefunden") + await save_settings({"amazon_last_sync": datetime.now().strftime("%Y-%m-%d %H:%M")}) + await add_log_entry( + "Amazon-Import (API)", f"Amazon ({domain})", 0, + "success", "Keine neuen Rechnungen gefunden", beleg_type="eingang", + ) + smtp.quit() + return {"processed": 0, "skipped": 0, "errors": 0} + + # Extract unique orders with their line items + orders = {} + for txn in transactions: + line_items = txn.get("transactionLineItems", []) + for item in line_items: + oid = item.get("orderId", "") + if oid and oid not in orders: + orders[oid] = { + "orderId": oid, + "invoiceNumber": txn.get("invoiceNumber", ""), + "transactionDate": txn.get("transactionDate", ""), + } + # Fallback: if no line items, use transaction-level orderId + if not line_items: + oid = txn.get("orderId", "") + if oid and oid not in orders: + orders[oid] = { + "orderId": oid, + "invoiceNumber": txn.get("invoiceNumber", ""), + "transactionDate": txn.get("transactionDate", ""), + } + + logger.info(f"Amazon API: {len(orders)} eindeutige Bestellungen gefunden") + + for oid, order_info in orders.items(): + # Check if already downloaded + if await is_invoice_downloaded(oid, oid): + skipped += 1 + continue + + # Download invoice PDF + pdf_data = await download_invoice(settings, oid) + + if pdf_data: + # Save debug copy if enabled + if settings.get("debug_save_amazon_pdfs") == "true": + debug_dir = Path("/data/uploads") / "amazon_invoices" + debug_dir.mkdir(parents=True, exist_ok=True) + debug_path = debug_dir / f"Amazon_Rechnung_{oid}.pdf" + debug_path.write_bytes(pdf_data) + logger.info(f"Amazon API: Debug-PDF gespeichert: {debug_path}") + + # Send via SMTP + filename = f"Amazon_Rechnung_{oid}.pdf" + subject = f"Amazon Rechnung - {oid}" + from_addr = settings.get("smtp_username", "belegimport@local") + msg = _build_forward_email( + from_addr=from_addr, + to_addr=import_email, + original_subject=subject, + original_from=f"Amazon ({domain})", + attachments=[(filename, pdf_data)], + ) + smtp_log = _send_with_log(smtp, msg) + await add_log_entry( + subject, f"Amazon ({domain})", 1, + "success", "", import_email, smtp_log, beleg_type="eingang", + ) + await mark_invoice_downloaded(oid, oid) + processed += 1 + logger.info(f"Amazon API: Rechnung für {oid} gesendet") + else: + # No invoice available for this order + await mark_invoice_downloaded(oid, oid) + skipped += 1 + logger.debug(f"Amazon API: Keine Rechnung für {oid}") + + except Exception as e: + logger.error(f"Amazon API: Import-Fehler: {e}", exc_info=True) + errors += 1 + await add_log_entry( + "Amazon-Import (API)", f"Amazon ({domain})", 0, + "error", str(e), beleg_type="eingang", + ) + finally: + try: + smtp.quit() + except Exception: + pass + + await save_settings({"amazon_last_sync": datetime.now().strftime("%Y-%m-%d %H:%M")}) + + if processed > 0 or errors > 0: + summary = f"{processed} verarbeitet, {skipped} übersprungen, {errors} Fehler" + await add_log_entry( + "Amazon-Import (API, Zusammenfassung)", f"Amazon ({domain})", processed, + "success" if errors == 0 else "warning", summary, beleg_type="eingang", + ) + + logger.info(f"Amazon API: Import fertig: {processed} verarbeitet, {skipped} übersprungen, {errors} Fehler") + return {"processed": processed, "skipped": skipped, "errors": errors} diff --git a/app/amazon_processor.py b/app/amazon_processor.py index b188b59..bfce613 100644 --- a/app/amazon_processor.py +++ b/app/amazon_processor.py @@ -736,12 +736,23 @@ async def _process_amazon_inner() -> dict: return {"processed": 0, "errors": 0, "error": error_detail} processed, skipped, errors = result["processed"], result["skipped"], result["errors"] + batch_done = result.get("batch_done", False) # Update last sync date await save_settings({"amazon_last_sync": datetime.now().strftime("%Y-%m-%d %H:%M")}) - # Log summary if nothing was processed - if processed == 0 and errors == 0: + # Log summary + if processed > 0 and batch_done: + summary = f"{processed} Rechnung(en) importiert. Weitere beim nächsten Abruf." + await add_log_entry( + email_subject="Amazon-Import (Batch)", + email_from=f"Amazon ({domain})", + attachments_count=processed, + status="success", + error_message=summary, + sent_to=import_email, + ) + elif processed == 0 and errors == 0: if skipped > 0: summary = f"Alle Rechnungen bereits importiert ({skipped} übersprungen)" else: @@ -787,13 +798,18 @@ async def _process_amazon_inner() -> dict: async def _collect_and_process_orders(page, domain, since_date, smtp_conn, settings, import_email) -> dict | None: """Collect orders AND process invoices page by page. - This ensures invoice buttons are visible when we try to click them, - because we process each page's orders before navigating to the next page. + Uses BATCH processing: only processes a limited number of invoices per run + to avoid Amazon session degradation. The scheduler will pick up remaining + orders in subsequent runs (already-imported orders are skipped automatically). + Returns None if session is invalid, otherwise dict with processed/skipped/errors counts. """ + MAX_INVOICES_PER_RUN = 2 # Limit to avoid Amazon session issues + processed = 0 skipped = 0 errors = 0 + batch_done = False # Flag: batch limit reached, stop processing # Navigate to orders page if needed actual_url = page.url @@ -813,6 +829,50 @@ async def _collect_and_process_orders(page, domain, since_date, smtp_conn, setti if "order-history" not in actual_url and "your-orders" not in actual_url: return None + # Reset to page 1 via SPA navigation (NOT page.reload() which kills session!) + # Click the "Bestellungen" tab or use the time filter to refresh the order list + logger.info(f"Amazon: Refreshe Bestellliste via SPA (aktuelle URL: {actual_url})...") + try: + refreshed = await page.evaluate("""() => { + // Strategy 1: Click the "Bestellungen" tab to reset to page 1 + const tabs = document.querySelectorAll('a[href*="your-orders"], a[href*="order-history"]'); + for (const tab of tabs) { + const text = (tab.innerText || '').trim(); + if ((text === 'Bestellungen' || text === 'Orders') && tab.offsetParent !== null) { + tab.click(); + return 'tab'; + } + } + // Strategy 2: Click pagination page 1 link + const page1Links = document.querySelectorAll('.a-pagination a[href*="pagination/1"], .a-pagination li:first-child a'); + for (const link of page1Links) { + if (link.offsetParent !== null) { + link.click(); + return 'pagination'; + } + } + // Strategy 3: Click the time filter to trigger a refresh + const filterSelect = document.querySelector('select[name="orderFilter"], select#orderFilter, select#time-filter'); + if (filterSelect) { + // Re-select the current value to trigger change event + const event = new Event('change', {bubbles: true}); + filterSelect.dispatchEvent(event); + return 'filter'; + } + return null; + }""") + if refreshed: + logger.info(f"Amazon: Bestellliste refreshed via {refreshed}") + await asyncio.sleep(3) + try: + await page.wait_for_load_state("networkidle", timeout=15000) + except Exception: + pass + else: + logger.info("Amazon: Kein SPA-Refresh möglich, verwende aktuelle Ansicht") + except Exception as e: + logger.warning(f"Amazon: SPA-Refresh fehlgeschlagen: {e}") + # Try to set time filter now = datetime.now() days_back = (now - since_date).days @@ -862,8 +922,14 @@ async def _collect_and_process_orders(page, domain, since_date, smtp_conn, setti logger.info(f"Amazon: Seite {page_num}: {len(page_orders)} gefunden, {len(new_orders)} neu") total_orders += len(new_orders) - # Process invoices for THIS page's orders immediately (buttons are visible now) + # Process invoices for THIS page's orders immediately for order in new_orders: + # Check batch limit + if processed >= MAX_INVOICES_PER_RUN: + batch_done = True + logger.info(f"Amazon: Batch-Limit erreicht ({MAX_INVOICES_PER_RUN} Rechnungen). Rest beim nächsten Abruf.") + break + order_id = order.get("id", "?") try: if await is_invoice_downloaded(order_id, order_id): @@ -920,7 +986,8 @@ async def _collect_and_process_orders(page, domain, since_date, smtp_conn, setti ) await mark_invoice_downloaded(order_id, order_id) - await _human_delay(2.0, 4.0) + # Long delay between orders to avoid Amazon rate-limiting + await _human_delay(8.0, 15.0) except Exception as e: errors += 1 @@ -933,6 +1000,10 @@ async def _collect_and_process_orders(page, domain, since_date, smtp_conn, setti error_message=str(e), ) + # Stop if batch limit reached + if batch_done: + break + # Navigate to next page has_next = await page.evaluate("""() => { const nextLink = document.querySelector('.a-pagination .a-last:not(.a-disabled) a'); @@ -960,8 +1031,9 @@ async def _collect_and_process_orders(page, domain, since_date, smtp_conn, setti else: break - logger.info(f"Amazon: Gesamt {total_orders} Bestellungen auf {page_num} Seite(n)") - return {"processed": processed, "skipped": skipped, "errors": errors} + status = "Batch-Limit" if batch_done else "komplett" + logger.info(f"Amazon: Gesamt {total_orders} Bestellungen auf {page_num} Seite(n), Status: {status}") + return {"processed": processed, "skipped": skipped, "errors": errors, "batch_done": batch_done} async def _collect_orders(page, domain: str, since_date: datetime) -> list[dict] | None: diff --git a/app/database.py b/app/database.py index 75fe50b..67adfb1 100644 --- a/app/database.py +++ b/app/database.py @@ -10,7 +10,7 @@ logger = logging.getLogger(__name__) _fernet = None -ENCRYPTED_KEYS = {"imap_password", "smtp_password", "smb_password", "amazon_password"} +ENCRYPTED_KEYS = {"imap_password", "smtp_password", "smb_password", "amazon_password", "amazon_client_secret", "amazon_refresh_token"} DEFAULT_SETTINGS = { "imap_server": "", @@ -53,6 +53,12 @@ DEFAULT_SETTINGS = { "amazon_domain": "amazon.de", "amazon_last_sync": "", "amazon_since_date": "", + # Amazon API (SP-API / Business API) + "amazon_app_id": "", # amzn1.sp.solution.xxxxx (from Developer Portal) + "amazon_client_id": "", # amzn1.application-oa2-client.xxxxx (LWA Client ID) + "amazon_client_secret": "", # LWA Client Secret + "amazon_refresh_token": "", + "amazon_mode": "browser", # "browser" or "api" # Debug "debug_save_amazon_pdfs": "false", } diff --git a/app/main.py b/app/main.py index c370ef4..8e52f0f 100644 --- a/app/main.py +++ b/app/main.py @@ -31,6 +31,12 @@ from app.amazon_processor import ( close_interactive_login as amazon_close_interactive, is_interactive_login_active as amazon_login_active, ) +from app.amazon_api import ( + get_oauth_authorize_url, + exchange_auth_code, + check_api_configured, + process_amazon_api, +) logging.basicConfig( level=getattr(logging, os.environ.get("LOG_LEVEL", "INFO").upper(), logging.INFO), @@ -388,6 +394,10 @@ async def api_amazon_settings(request: Request): "amazon_email": body.get("amazon_email", ""), "amazon_password": body.get("amazon_password") or current.get("amazon_password", ""), "amazon_since_date": body.get("amazon_since_date", ""), + "amazon_mode": body.get("amazon_mode", "browser"), + "amazon_app_id": body.get("amazon_app_id", ""), + "amazon_client_id": body.get("amazon_client_id", ""), + "amazon_client_secret": body.get("amazon_client_secret") or current.get("amazon_client_secret", ""), } await save_settings(data) return JSONResponse({"success": True}) @@ -395,9 +405,110 @@ async def api_amazon_settings(request: Request): @app.get("/api/amazon-status") async def api_amazon_status(): - valid = await amazon_check_session() - login_active = amazon_login_active() - return JSONResponse({"session_valid": valid, "login_active": login_active}) + settings = await get_settings() + mode = settings.get("amazon_mode", "browser") + + if mode == "api": + api_status = await check_api_configured() + return JSONResponse({ + "mode": "api", + "session_valid": api_status.get("authorized", False), + "login_active": False, + "api_configured": api_status.get("configured", False), + "api_authorized": api_status.get("authorized", False), + }) + else: + valid = await amazon_check_session() + login_active = amazon_login_active() + return JSONResponse({ + "mode": "browser", + "session_valid": valid, + "login_active": login_active, + }) + + +def _get_oauth_redirect_uri(request: Request) -> str: + """Get OAuth redirect URI from env var or request.""" + base = os.environ.get("OAUTH_REDIRECT_BASE", "").rstrip("/") + if not base: + base = str(request.base_url).rstrip("/") + return f"{base}/api/amazon-oauth-callback" + + +@app.get("/api/amazon-oauth-url") +async def api_amazon_oauth_url(request: Request): + """Generate OAuth authorization URL for Amazon Business API.""" + settings = await get_settings() + app_id = settings.get("amazon_app_id", "") + if not app_id: + return JSONResponse({"error": "App-ID nicht konfiguriert"}, status_code=400) + + redirect_uri = _get_oauth_redirect_uri(request) + domain = settings.get("amazon_domain", "amazon.de") + state = str(uuid.uuid4()) + + url = get_oauth_authorize_url(app_id, redirect_uri, domain, state) + return JSONResponse({"url": url, "state": state}) + + +@app.get("/api/amazon-oauth-callback") +async def api_amazon_oauth_callback(request: Request): + """Handle OAuth callback from Amazon.""" + code = request.query_params.get("spapi_oauth_code") or request.query_params.get("code", "") + error = request.query_params.get("error", "") + + if error: + return HTMLResponse(f"

Autorisierung fehlgeschlagen

{error}

Fenster kann geschlossen werden.

") + + if not code: + return HTMLResponse("

Fehler: Kein Autorisierungscode erhalten

Fenster kann geschlossen werden.

") + + settings = await get_settings() + client_id = settings.get("amazon_client_id", "") + client_secret = settings.get("amazon_client_secret", "") + redirect_uri = _get_oauth_redirect_uri(request) + + result = await exchange_auth_code(code, client_id, client_secret, redirect_uri) + + if "error" in result: + return HTMLResponse(f"

Token-Exchange fehlgeschlagen

{result['error']}

") + + refresh_token = result.get("refresh_token", "") + if refresh_token: + await save_settings({"amazon_refresh_token": refresh_token}) + return HTMLResponse( + "

Autorisierung erfolgreich!

" + "

Refresh-Token wurde gespeichert. Dieses Fenster kann geschlossen werden.

" + "" + ) + + return HTMLResponse("

Fehler: Kein Refresh-Token erhalten

") + + +@app.post("/api/amazon-oauth-exchange") +async def api_amazon_oauth_exchange(request: Request): + """Manual OAuth code exchange - user pastes the code from the redirect URL.""" + body = await request.json() + code = body.get("code", "").strip() + if not code: + return JSONResponse({"error": "Kein Code angegeben"}, status_code=400) + + settings = await get_settings() + client_id = settings.get("amazon_client_id", "") + client_secret = settings.get("amazon_client_secret", "") + redirect_uri = _get_oauth_redirect_uri(request) + + result = await exchange_auth_code(code, client_id, client_secret, redirect_uri) + + if "error" in result: + return JSONResponse({"error": result["error"]}, status_code=400) + + refresh_token = result.get("refresh_token", "") + if refresh_token: + await save_settings({"amazon_refresh_token": refresh_token}) + return JSONResponse({"success": True}) + + return JSONResponse({"error": "Kein Refresh-Token erhalten"}, status_code=400) @app.post("/api/amazon-login") @@ -462,7 +573,12 @@ async def api_amazon_logout(): @app.post("/api/amazon-process") async def api_amazon_process(): - result = await process_amazon() + settings = await get_settings() + mode = settings.get("amazon_mode", "browser") + if mode == "api": + result = await process_amazon_api() + else: + result = await process_amazon() return JSONResponse(result) diff --git a/app/scheduler.py b/app/scheduler.py index e22e09e..d27e68b 100644 --- a/app/scheduler.py +++ b/app/scheduler.py @@ -6,6 +6,8 @@ from apscheduler.triggers.interval import IntervalTrigger from app.mail_processor import process_mailbox from app.smb_processor import process_smb_share from app.amazon_processor import process_amazon +from app.amazon_api import process_amazon_api +from app.database import get_settings logger = logging.getLogger(__name__) @@ -34,7 +36,12 @@ async def _run_processor(): # Amazon separately with timeout - must not block next scheduler runs logger.info("Starte automatische Amazon-Verarbeitung...") try: - amazon_result = await asyncio.wait_for(process_amazon(), timeout=300) + settings = await get_settings() + amazon_mode = settings.get("amazon_mode", "browser") + if amazon_mode == "api": + amazon_result = await asyncio.wait_for(process_amazon_api(), timeout=300) + else: + amazon_result = await asyncio.wait_for(process_amazon(), timeout=300) logger.info(f"Amazon-Verarbeitung abgeschlossen: {amazon_result}") except asyncio.TimeoutError: logger.error("Amazon-Verarbeitung nach 5 Minuten abgebrochen (Timeout)") diff --git a/app/templates/platforms.html b/app/templates/platforms.html index 17c074b..37d64c5 100644 --- a/app/templates/platforms.html +++ b/app/templates/platforms.html @@ -17,6 +17,13 @@ +
+ + +
-
- - -
-
- - -
Leer = letzte 30 Tage
-
- {% if settings.get('amazon_last_sync') %} - Letzter Abruf: {{ settings.get('amazon_last_sync') }} - {% endif %} +
+ + +
+

API-Zugangsdaten (Amazon Business API)

+
+
+ + + Aus dem Solution Provider Portal +
+
+ + +
+
+ + +
+
+ + {% if settings.get('amazon_refresh_token') %} + Refresh-Token: gespeichert + {% else %} + Refresh-Token: fehlt - bitte autorisieren + {% endif %} + +
+ + +
+

Browser-Zugangsdaten

+
+
+ + +
+
+ + +
+
+
+ +
+ {% if settings.get('amazon_last_sync') %} + Letzter Abruf: {{ settings.get('amazon_last_sync') }} + {% endif %} +
@@ -57,16 +106,41 @@

Anmeldung & Abruf

- Session: + Status: Wird geprüft...
-
- - - - + + + + +
+
+ + + + +
+
+
@@ -97,6 +171,16 @@
{% endblock %} diff --git a/docker-compose.yml b/docker-compose.yml index 8d754d3..32d1a43 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -10,4 +10,5 @@ services: - DB_PATH=/data/belegimport.db - TZ=Europe/Berlin - LOG_LEVEL=DEBUG + - OAUTH_REDIRECT_BASE=https://hacker-net.de restart: unless-stopped diff --git a/requirements.txt b/requirements.txt index 2018afa..6e322f1 100644 --- a/requirements.txt +++ b/requirements.txt @@ -14,3 +14,4 @@ sse-starlette==2.2.1 smbprotocol==1.14.0 playwright==1.49.1 playwright-stealth==2.0.2 +httpx==0.28.1