Site SEO Auditor
Run ID: 69cb4da161b1021a29a87cf22026-03-31SEO & Growth
PantheraHive BOS
BOS Dashboard

Site SEO Auditor: Step 1 of 5 - Crawl Execution

This document details the execution and output of the initial crawling phase for your Site SEO Auditor. This crucial first step ensures a comprehensive and accurate foundation for the subsequent SEO audit.


Step 1: Puppeteer Crawl Execution

Description: This step leverages Puppeteer, a headless browser, to systematically navigate and discover all accessible pages on your website. Unlike traditional HTTP crawlers, Puppeteer operates as a real browser, rendering JavaScript-heavy content and interacting with the DOM just like a user would. This ensures that no dynamically loaded content or client-side rendered pages are missed, providing a complete picture of your site's structure and content.

Purpose: The primary goal of this phase is to generate a definitive list of all internal URLs within your domain. This list serves as the input for the subsequent audit steps, ensuring that every page is thoroughly analyzed against the 12-point SEO checklist.

Methodology:

  1. Headless Browser Emulation: Puppeteer launches a Chromium instance in a headless environment, simulating a real user's browser experience. This is critical for modern web applications that rely heavily on JavaScript for content rendering.
  2. Root URL Initiation: The crawl begins at the specified root URL of your website.
  3. Recursive Link Discovery:

* For each visited page, Puppeteer analyzes the rendered DOM to identify all valid internal links (<a> tags with href attributes pointing to the same domain).

* These discovered links are added to a queue for subsequent visitation, ensuring a thorough exploration of the site's architecture.

* To prevent infinite loops and ensure efficiency, a robust link de-duplication mechanism is in place, only processing unique URLs.

  1. Robots.txt Adherence (Configurable): By default, the crawler respects your robots.txt file, ensuring that pages explicitly disallowed for crawling are not accessed. This behavior can be configured if a full, unrestricted audit is required (e.g., for internal staging environments).
  2. Initial Page Data Capture: During the crawl, for each discovered URL, the following basic information is captured:

* HTTP Status Code: To immediately identify broken links (e.g., 404 Not Found) or redirects (e.g., 301 Moved Permanently).

* Final URL: The resolved URL after any redirects.

* Parent URL: The URL from which the current page was discovered, helping to trace link paths.

* HTML Content (Initial Load): The raw HTML of the page upon initial load, before extensive DOM manipulation, is captured for later analysis.

  1. Error Handling & Timeouts: The crawler is equipped with robust error handling for network issues, page timeouts, and other potential problems, ensuring resilience and comprehensive reporting.

Output of Step 1: Discovered URLs & Basic Crawl Data

Upon completion of the crawl, the following data is generated and will serve as the foundation for the subsequent SEO audit steps. This output is stored in a temporary, structured format, ready for processing.

Deliverable: A comprehensive list of all unique, discoverable internal URLs on your website, along with initial crawl metadata for each.

Structure (Example Snippet):

json • 1,509 chars
[
  {
    "id": "unique-uuid-1",
    "url": "https://www.yourwebsite.com/",
    "resolvedUrl": "https://www.yourwebsite.com/",
    "httpStatusCode": 200,
    "parentUrl": null, // null for the starting page
    "crawlTimestamp": "2023-10-27T10:00:00Z",
    "initialHtmlSnippet": "<!DOCTYPE html><html lang=\"en\">..." // First 1000 chars of HTML
  },
  {
    "id": "unique-uuid-2",
    "url": "https://www.yourwebsite.com/products/",
    "resolvedUrl": "https://www.yourwebsite.com/products/",
    "httpStatusCode": 200,
    "parentUrl": "https://www.yourwebsite.com/",
    "crawlTimestamp": "2023-10-27T10:00:05Z",
    "initialHtmlSnippet": "<!DOCTYPE html><html lang=\"en\">..."
  },
  {
    "id": "unique-uuid-3",
    "url": "https://www.yourwebsite.com/old-page",
    "resolvedUrl": "https://www.yourwebsite.com/new-page", // If redirected
    "httpStatusCode": 301, // Indicates a redirect
    "parentUrl": "https://www.yourwebsite.com/products/",
    "crawlTimestamp": "2023-10-27T10:00:10Z",
    "initialHtmlSnippet": "<html><head><meta http-equiv=\"refresh\" content=\"0; URL=/new-page\"/></head><body>..."
  },
  {
    "id": "unique-uuid-4",
    "url": "https://www.yourwebsite.com/broken-link",
    "resolvedUrl": "https://www.yourwebsite.com/broken-link",
    "httpStatusCode": 404, // Indicates a broken link
    "parentUrl": "https://www.yourwebsite.com/about-us/",
    "crawlTimestamp": "2023-10-27T10:00:15Z",
    "initialHtmlSnippet": null // Or 404 page HTML
  }
  // ... many more entries
]
Sandboxed live preview

Key Information Provided:

  • id: A unique identifier for each discovered page, used for internal tracking.
  • url: The URL as it was discovered by the crawler.
  • resolvedUrl: The final URL after any redirects have been followed. This is crucial for auditing the correct page.
  • httpStatusCode: The HTTP status code returned by the server (e.g., 200 OK, 301 Moved Permanently, 404 Not Found, 500 Internal Server Error).
  • parentUrl: The URL of the page that contained the link leading to the current page. This helps in understanding the site's internal linking structure.
  • crawlTimestamp: The exact time the page was crawled.
  • initialHtmlSnippet: A brief snippet of the raw HTML content captured upon the initial page load.

Next Steps

The comprehensive list of URLs and their initial crawl data from this step will now be passed to Step 2: Page Auditing & Data Extraction. In that phase, each of these discovered pages will be revisited by Puppeteer to perform a deep dive into its specific SEO attributes and collect all necessary data for the 12-point checklist.

hive_db Output

Step 2 of 5: hive_db → Diff - Comprehensive Audit Comparison

This document details the second critical step in your Site SEO Auditor workflow: the hive_db → diff process. This phase is responsible for intelligently comparing your site's most recent SEO audit against a previous baseline (typically the last successful audit), identifying precise changes, improvements, regressions, and new issues across all audited pages and metrics.


1. Step Overview & Purpose

The hive_db → diff step serves as the analytical core of your SEO auditing process. After the headless crawler (Step 1) has completed its comprehensive scan and stored the latest audit data in your dedicated MongoDB instance (hive_db), this step retrieves that fresh data alongside your site's previous audit report. It then performs a meticulous, page-by-page and metric-by-metric comparison to generate a detailed "diff" report.

Purpose: To provide a clear, actionable overview of your site's SEO performance evolution. This enables you to quickly pinpoint what has changed, track the impact of recent optimizations, identify emerging issues, and celebrate resolved problems. Without this diff, understanding the dynamic nature of your site's SEO health would be significantly more challenging.


2. Data Retrieval from hive_db

Before any comparison can occur, the necessary audit reports are retrieved from your dedicated hive_db MongoDB instance.

  • Current Audit Report: The full, freshly generated SiteAuditReport from the most recent crawl (Step 1). This report contains all 12 SEO checklist points for every page discovered on your site during the current run.
  • Previous Audit Report: The SiteAuditReport identified as the last successfully completed audit for your domain. This serves as the baseline for comparison. If no previous audit exists (e.g., first-ever run), the diff will effectively highlight all current issues as "new."

Each SiteAuditReport document in MongoDB is structured to include:

  • _id: Unique identifier for the report.
  • domain: The domain audited.
  • timestamp: When the audit was completed.
  • pages: An array of page objects, each containing:

* url: The specific URL audited.

* metrics: An object detailing the status/value for each of the 12 SEO checklist items.

* issues: An array of specific problems found on that page.


3. Diffing Methodology & Logic

The diffing process employs a sophisticated comparison algorithm to analyze the two audit reports. The comparison is performed on a page-by-page basis, and then drill-down into each of the 12 SEO metrics.

3.1. Page-Level Comparison

  1. New Pages: Identifies any URLs present in the Current Audit Report that were not found in the Previous Audit Report. These pages are treated as entirely new entries, and all their identified SEO issues are flagged as "new issues."
  2. Removed Pages: Identifies any URLs present in the Previous Audit Report that are no longer found in the Current Audit Report. While not a direct SEO issue, this can indicate structural changes, redirects, or accidental page removals that warrant attention.
  3. Existing Pages: For all URLs present in both audit reports, a detailed metric-by-metric comparison is performed.

3.2. Metric-by-Metric Comparison (12-Point SEO Checklist)

For each existing page, the system compares the status, value, or presence of each of the 12 SEO metrics between the current and previous audit.

  1. Meta Title & Description Uniqueness:

* Logic: Compares the uniqueness status (unique, duplicate, missing) and the actual title/description content.

* Diff Outcomes:

* Resolved: Previously duplicate/missing, now unique and present.

* New Issue: Previously unique/present, now duplicate/missing.

* Regression: Previously unique, now duplicate with a new page.

* Content Change: Title/description content has changed (even if still unique).

  1. H1 Presence:

* Logic: Checks for the presence and content of the <h1> tag.

* Diff Outcomes:

* Resolved: H1 was missing, now present.

* New Issue: H1 was present, now missing.

* Content Change: H1 text has been modified.

  1. Image Alt Coverage:

* Logic: Compares the count of images without alt attributes.

* Diff Outcomes:

* Resolved: Number of missing alt texts has decreased.

* New Issue: Number of missing alt texts has increased.

* Regression: Previously all images had alt text, now some are missing.

  1. Internal Link Density:

* Logic: Compares the number of internal links pointing to and from the page.

* Diff Outcomes:

* Change: Significant increase or decrease in internal link count (configurable threshold for "significant").

  1. Canonical Tags:

* Logic: Compares the presence, validity, and target URL of the canonical tag.

* Diff Outcomes:

* Resolved: Canonical tag was missing/incorrect, now present/correct.

* New Issue: Canonical tag was present/correct, now missing/incorrect.

* Target Change: Canonical URL has been updated.

  1. Open Graph Tags:

* Logic: Compares the presence and content of essential Open Graph tags (og:title, og:description, og:image, og:url).

* Diff Outcomes:

* Resolved: OG tags were missing/incorrect, now present/correct.

* New Issue: OG tags were present/correct, now missing/incorrect.

* Content Change: Values for specific OG tags have been modified.

  1. Core Web Vitals (LCP/CLS/FID):

* Logic: Compares the Lighthouse scores for Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and First Input Delay (FID) against predefined thresholds and previous scores.

* Diff Outcomes:

* Improvement: Score has improved, or passed a critical threshold (e.g., LCP moved from "Needs Improvement" to "Good").

* Regression: Score has worsened, or failed a critical threshold.

* No Change: Scores remain consistent.

  1. Structured Data Presence:

* Logic: Compares the presence and types of structured data (e.g., Schema.org markup) detected on the page.

* Diff Outcomes:

* Resolved: Structured data was missing, now present.

* New Issue: Structured data was present, now missing.

* Schema Change: Different types of schema are now detected.

  1. Mobile Viewport:

* Logic: Compares the presence and correct configuration of the viewport meta tag.

* Diff Outcomes:

* Resolved: Viewport tag was missing/incorrect, now present/correct.

* New Issue: Viewport tag was present/correct, now missing/incorrect.

3.3. Issue-Level Granularity

Beyond just metric status, the diff also tracks specific issues. If a page previously had "Duplicate Meta Title" and now doesn't, that specific issue is marked as "Resolved." If a new "Missing H1" issue appears, it's marked as "New." This granular tracking provides precise insights into problem resolution and emergence.


4. Output of the Diffing Process

The output of the hive_db → diff step is a structured Diff Report Object that clearly delineates all changes between the two audit runs. This object is then passed to the subsequent steps in the workflow.

4.1. High-Level Summary

  • Total Issues Found (Current): Current number of unique SEO issues across the site.
  • Total Issues Found (Previous): Number of unique SEO issues in the last audit.
  • Net Change in Issues: (Current - Previous).
  • New Issues Discovered: Count of issues present in the current audit that were not in the previous.
  • Issues Resolved: Count of issues present in the previous audit that are no longer in the current.
  • Regressions: Count of metrics/pages that moved from a "good" state to a "bad" state (e.g., Core Web Vitals score worsened, unique meta title became duplicate).
  • Pages Added: Count of new pages identified.
  • Pages Removed: Count of pages no longer found.

4.2. Detailed Page-Level Deltas

For each affected page, a detailed breakdown will be provided:

  • URL: The specific page where changes occurred.
  • Status: NEW_PAGE, REMOVED_PAGE, MODIFIED_PAGE.
  • Changes Array: A list of specific changes for that page, including:

* metric: The SEO metric affected (e.g., meta_title_uniqueness).

* type: NEW_ISSUE, RESOLVED_ISSUE, REGRESSION, VALUE_CHANGE.

* description: A human-readable description of the change (e.g., "Meta Title is now duplicate with /page-b," "H1 tag is now present," "LCP improved from 3.5s to 2.1s").

* previous_value: The value/status from the previous audit.

* current_value: The value/status from the current audit.

4.3. Specific Issue Tracking

The report will categorize issues:

  • New Critical Issues: High-priority problems that have just emerged.
  • Resolved Critical Issues: High-priority problems that have been successfully addressed.
  • New Warnings/Improvements: Lower-priority issues or positive changes.

5. Key Benefits & Actionability

The detailed diff generated by this step provides immense value:

  • Track Progress: Clearly see the impact of your SEO efforts and optimizations over time.
  • Early Warning System: Instantly identify regressions or new issues that could negatively affect your search rankings.
  • Prioritization: Focus resources on addressing newly emerged critical issues or regressions first.
  • Validation: Confirm that implemented fixes (e.g., adding missing alt text, correcting canonicals) have been successfully deployed and detected.
  • Comprehensive Audit Trail: Maintain a historical record of your site's SEO health evolution.

6. Next Steps in the Workflow

The comprehensive Diff Report Object generated in this step is immediately passed to the subsequent phases of the workflow:

  • Step 3: Gemini → Fix Generation: Any identified "New Issues" or "Regressions" will be sent to Gemini for AI-powered generation of exact, actionable fixes. "Resolved Issues" will be noted as successful remediation.
  • Step 4: Report Generation: The Diff Report, along with Gemini's generated fixes, will form the core of your final, user-facing SiteAuditReport document, including the "before/after" comparison.
  • Step 5: Notification & Storage: The final report will be stored in hive_db and delivered to you via your preferred notification channels.
gemini Output

Workflow Step: gemini → batch_generate - Automated SEO Fix Generation

This document details the execution of Step 3 of 5 in the "Site SEO Auditor" workflow, focusing on the automated generation of precise fixes for identified SEO issues using Google Gemini.


1. Step Overview: Gemini's Role in Automated Fix Generation

Workflow: Site SEO Auditor

Current Step: gemini → batch_generate

Description: This crucial step leverages the advanced capabilities of Google Gemini to analyze the "broken elements" or SEO deficiencies identified by the headless crawler. Gemini then generates exact, actionable fixes for these issues in a batch process, significantly streamlining the SEO optimization workflow.

2. Purpose and Intelligence Behind the Step

The primary purpose of this step is to transform raw audit findings into concrete, implementable solutions. Instead of merely flagging issues, the system intelligently provides the how-to-fix for each problem.

  • Intelligent Analysis: Gemini receives detailed context about each SEO issue detected (e.g., specific URL, problematic HTML snippet, the exact audit rule violated, surrounding content). It then applies its vast understanding of SEO best practices, web development standards, and content optimization to diagnose the root cause and formulate an optimal solution.
  • Batch Processing: The system is designed to send multiple identified issues to Gemini concurrently or in rapid succession, allowing for efficient generation of fixes across numerous pages and different types of problems within a single audit cycle. This ensures that a comprehensive set of recommendations is available promptly.

3. Input to Gemini for Fix Generation

For each identified SEO issue, Gemini receives a structured prompt containing the following critical information:

  • Page URL: The exact URL where the issue was detected.
  • Audit Point Violation: The specific SEO checklist item that failed (e.g., "Meta Title Uniqueness", "Missing H1 Tag", "Image Alt Coverage", "Broken Internal Link").
  • Problematic HTML/Content Snippet: A direct excerpt of the HTML or content surrounding the error, providing Gemini with essential context.
  • Error Description: A clear, concise description of the detected issue (e.g., "Duplicate meta title found", "Image lacks an alt attribute", "H1 tag is missing from the page", "Canonical tag points to incorrect URL").
  • Audit Data: Any relevant metrics or data points from the crawler, such as current Core Web Vitals scores, link destination, or existing Open Graph tag values.

Example Input for a Missing H1:


Prompt: "On URL: https://www.example.com/product-page, the 'H1 Presence' audit failed. The page HTML snippet near the expected H1 position is: `<div class='header'><p>Product Overview</p></div>`. Please provide the exact HTML fix to add a semantically correct H1 tag."

4. Gemini's Output: Exact, Actionable Fixes

Gemini processes the input and generates precise, ready-to-implement solutions. The output for each fix is structured to be highly actionable and developer-friendly.

Common Fix Formats Include:

  • HTML Snippets: Specific HTML code to insert, replace, or modify.

Example (Missing H1)*:


        <!-- Recommended fix: Insert this H1 tag after the opening <body> or within the main content area -->
        <h1 class="page-title">Your Product Page Title Here</h1>

Example (Image Alt Text)*:


        <!-- Recommended fix: Add or update the alt attribute for the image -->
        <img src="/assets/product-image.jpg" alt="[Descriptive alt text for product image]">
  • Meta Tag Recommendations: Corrected or newly generated meta titles, descriptions, canonical tags, and Open Graph tags.

Example (Duplicate Meta Description)*:


        <!-- Recommended fix: Update the meta description for uniqueness and relevance -->
        <meta name="description" content="[Unique, keyword-rich description for this specific page, max 160 characters]">

Example (Incorrect Canonical Tag)*:


        <!-- Recommended fix: Ensure the canonical tag points to the correct, preferred version of the URL -->
        <link rel="canonical" href="https://www.example.com/correct-product-page-url/">
  • Structured Data Snippets: JSON-LD code for specific schema types (e.g., Product, Article, FAQPage) where structured data is missing or incomplete.

Example (Missing Product Schema)*:


        <!-- Recommended fix: Add this JSON-LD script to the <head> or <body> -->
        <script type="application/ld+json">
        {
          "@context": "https://schema.org",
          "@type": "Product",
          "name": "Your Product Name",
          "image": "https://www.example.com/assets/product-image.jpg",
          "description": "A brief description of your product.",
          "sku": "SKU12345",
          "offers": {
            "@type": "Offer",
            "priceCurrency": "USD",
            "price": "99.99"
          }
        }
        </script>
  • Content/Copy Suggestions: Recommendations for improving content for SEO, such as keyword integration or readability (e.g., for meta descriptions, H1s, or general body text).
  • Developer Instructions: Clear, concise instructions on where and how to apply the fix within the existing codebase or CMS.

5. Integration and Storage

The generated fixes are immediately integrated into the SiteAuditReport document stored in MongoDB. Each fix is associated with the specific audit issue it addresses.

  • Before/After Diff: The system maintains a "before" state (the problematic HTML/configuration) and captures the "after" state (the recommended fix from Gemini). This allows for clear tracking of changes and verification of implementation.
  • Actionable Reporting: The SiteAuditReport now contains not just a list of problems, but a comprehensive action plan, ready for review and implementation by your development or marketing team.

6. Benefits to the Customer

This gemini → batch_generate step delivers significant value:

  • Automated Problem Solving: Moves beyond mere identification to provide direct solutions, saving valuable time and resources.
  • Reduced Manual Effort: Eliminates the need for manual diagnosis and research by SEO specialists or developers for common issues.
  • Accelerated SEO Improvements: Faster turnaround from audit to implementation, leading to quicker improvements in search engine rankings and visibility.
  • Precision and Accuracy: Gemini's advanced AI ensures fixes are aligned with current SEO best practices and technical standards.
  • Developer-Ready Output: Provides code snippets and clear instructions that can be directly handed over to development teams for implementation.
  • Comprehensive Action Plan: Transforms the audit report into a detailed roadmap for site optimization.

This step ensures that your Site SEO Auditor isn't just a diagnostic tool, but a powerful, proactive engine for continuous SEO improvement, providing not just the "what" but the "how" for every identified opportunity.

hive_db Output

Step 4 of 5: Data Persistence - Storing Your Site SEO Audit Report

This crucial step marks the successful persistence of your comprehensive SEO audit findings. Following the completion of the in-depth scan and analysis by our headless crawler, all generated insights, identified issues, and Gemini-powered fixes are now securely stored within your dedicated hive_db database. This ensures a persistent, historical record of your site's SEO health.


Comprehensive SiteAuditReport Document Structure

The audit results are meticulously structured into a SiteAuditReport document, which is then upserted into the SiteAuditReports collection within your dedicated hive_db instance. This document captures every facet of the audit, enabling a holistic view of your site's SEO health.

Each SiteAuditReport document includes, but is not limited to, the following key fields:

  • audit_id: A unique identifier for this specific audit run, facilitating easy retrieval and tracking.
  • site_url: The primary URL of the audited website.
  • audit_timestamp: The exact date and time when the audit was completed, crucial for historical analysis.
  • overall_score: An aggregate score reflecting the overall SEO health of your site based on the 12-point checklist.
  • page_reports: An array of detailed reports for each individual page visited by the crawler, containing:

* page_url: The URL of the specific page.

* meta_title: Content and uniqueness status (Pass/Fail).

* meta_description: Content and uniqueness status (Pass/Fail).

* h1_presence: Boolean indicating the presence of an H1 tag (Pass/Fail).

* image_alt_coverage: Percentage of images with descriptive alt text (Pass/Fail).

* internal_link_density: Number of internal links found on the page.

* canonical_tag: Presence and correctness of the canonical URL (Pass/Fail).

* open_graph_tags: Presence and validity of Open Graph tags for social sharing (Pass/Fail).

* core_web_vitals: Detailed scores for Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and First Input Delay (FID) (Pass/Fail).

* structured_data_presence: Boolean indicating the presence of schema markup (Pass/Fail).

* mobile_viewport: Presence of the viewport meta tag for mobile responsiveness (Pass/Fail).

* status: An aggregate status for the page (e.g., "Good," "Needs Improvement," "Critical").

  • identified_issues: A consolidated, site-wide list of all broken elements and SEO deficiencies discovered during the audit, categorized for clarity.
  • gemini_fixes: For each identified issue, a detailed, actionable fix generated by our Gemini AI, often including specific code snippets, configuration changes, or step-by-step instructions.
  • before_after_diff: A critical component that automatically compares the current audit's findings with the most recent previous audit for your site. This highlights:

* Improvements: Metrics or issues that have been resolved or improved.

* Regressions: Metrics or issues that have worsened since the last audit.

* New Issues: Problems identified for the first time.

This differential analysis provides immediate context on the impact of recent changes to your website.


Secure and Efficient Data Upsert into MongoDB

The upsert operation is a highly efficient database command that intelligently handles the data persistence process within your hive_db (MongoDB) instance:

  • Mechanism:

Insertion: If a SiteAuditReport for the specific audit_id (or site_url and audit_timestamp combination) does not* already exist in the SiteAuditReports collection, a new document is inserted. This typically occurs during the very first audit or a manual re-run that generates a new unique audit_id.

Update Logic: If a SiteAuditReport for the given site_url and audit_timestamp (or a similar identifying key for recurring audits) already exists*, the existing document is updated with the latest comprehensive audit results. This is crucial for maintaining a single, up-to-date record for a specific audit run, especially if any post-processing or additional data enrichment occurs.

  • Key for Upsert: The primary key used for this operation is a combination of site_url and audit_timestamp. This ensures that each unique audit run for your site is correctly identified and either stored as a new record or updated appropriately.
  • Database: All data is securely stored within your dedicated hive_db instance, leveraging MongoDB's robust NoSQL capabilities for scalability and flexibility.
  • Automation: This process is automatically triggered immediately after every scheduled audit (every Sunday at 2 AM) or following any on-demand audit request, ensuring your SEO health data is always current and readily available.

Empowering Your SEO Strategy with Persistent Data

By diligently storing your SiteAuditReport data in hive_db, we provide you with invaluable resources to continuously enhance and refine your SEO strategy:

  • Historical Performance Tracking: Effortlessly monitor your site's SEO health over extended periods, observing trends, identifying patterns, and understanding the long-term impact of implemented changes.
  • Measurable Progress & ROI: The before_after_diff component allows you to clearly visualize the improvements (or regressions) on specific metrics and issues, providing concrete, data-driven evidence of your SEO efforts' return on investment.
  • Actionable Insights & Fixes Repository: Access a searchable and organized repository of all identified issues and their corresponding Gemini-generated fixes. This empowers your team to prioritize, plan, and execute corrective actions with maximum efficiency.
  • Comprehensive Reporting Foundation: This meticulously stored data forms the bedrock for advanced reporting, dynamic dashboards, and custom analytics, offering deeper, more granular insights into your site's performance and areas for optimization.
  • Compliance & Record Keeping: Maintain a robust historical record of your site's SEO compliance and technical health, which can be vital for internal audits, regulatory requirements, or demonstrating due diligence.

What Happens Next (Step 5 of 5)

With the SiteAuditReport successfully persisted in hive_db, the final step (Step 5) involves the critical distribution of these findings. This typically includes sending a comprehensive report to designated stakeholders and updating your centralized dashboard with the latest SEO health metrics, ensuring you and your team are immediately informed and ready to act on the audit results.

hive_db Output

Step 5 of 5: Database Update & Report Generation (hive_db → conditional_update)

This final step in the "Site SEO Auditor" workflow is critical for persisting the comprehensive audit results, enabling historical tracking, and providing actionable insights through a "before/after" comparison. All the data gathered by the headless crawler, analyzed against the 12-point SEO checklist, and enhanced with Gemini's fix suggestions, is now securely stored in your dedicated MongoDB database.


1. Purpose of this Step: Data Persistence and Historical Tracking

The conditional_update operation ensures that every SEO audit performed (either on-demand or via the weekly schedule) is meticulously recorded. This step serves to:

  • Store Comprehensive Audit Reports: Each audit run generates a complete SiteAuditReport document.
  • Enable Trend Analysis: By storing historical data, we can track your site's SEO performance over time, identifying improvements or regressions.
  • Provide Actionable Diffs: A "before/after" difference is calculated against the most recent previous audit, highlighting specific changes and their impact.
  • Maintain Data Integrity: Ensures all audit data, including the Gemini-generated fixes, is safely and reliably stored.

2. Data Stored: The SiteAuditReport Document

A new SiteAuditReport document is created in your MongoDB database for each audit run. This document is designed to be comprehensive and easily queryable.

Key Fields within a SiteAuditReport:

  • _id: Unique identifier for the audit report.
  • siteUrl: The URL of the website that was audited.
  • auditTimestamp: Date and time when the audit was completed (ISO format).
  • triggerType: Indicates how the audit was initiated (scheduled or on-demand).
  • overallStatus: An aggregated status for the entire site (Pass, Warning, Fail) based on the severity of identified issues.
  • summary:

* totalPagesAudited: Count of unique pages visited.

* pagesWithIssues: Count of pages that failed one or more checks.

* seoHealthScore: A calculated percentage score reflecting overall site SEO health.

* issuesByCategory: Breakdown of issues by type (e.g., metaTitleMissing: 5, imageAltMissing: 12).

  • pageResults: An array of objects, each representing the audit findings for a single page.

* pageUrl: The URL of the specific page.

* pageStatus: (Pass, Warning, Fail) for this individual page.

* seoChecks: An object containing the results for each of the 12 SEO checklist items:

* metaTitle: { status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }

* metaDescription: { status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }

* h1Presence: { status: 'Pass'/'Fail', count: 1, issue: '...', fixSuggestion: '...' }

* imageAltCoverage: { status: 'Pass'/'Fail', coverage: '85%', issues: [...], fixSuggestion: '...' }

* internalLinkDensity: { status: 'Pass'/'Fail', count: 20, issue: '...', fixSuggestion: '...' }

* canonicalTag: { status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }

* openGraphTags: { status: 'Pass'/'Fail', tags: {...}, issue: '...', fixSuggestion: '...' }

* coreWebVitals: { status: 'Pass'/'Fail', lcp: '...', cls: '...', fid: '...', issue: '...', fixSuggestion: '...' }

* structuredData: { status: 'Pass'/'Fail', types: [...], issue: '...', fixSuggestion: '...' }

* mobileViewport: { status: 'Pass'/'Fail', issue: '...', fixSuggestion: '...' }

(Each issue field contains a description of the problem, and fixSuggestion holds the Gemini-generated actionable fix.)*

* brokenElements: An array of specific issues identified on the page, including their Gemini-generated fixes.

* issueDescription: "Missing H1 tag"

* geminiFix: "Add an H1 tag to the page, ensuring it's unique and semantically relevant to the page content. Example: <H1>Your Page Title Here</H1>"

  • diffFromPreviousAudit: (Detailed in the next section)

3. Conditional Logic and Diff Calculation

Before storing the new report, the system performs a conditional_update check:

  1. Retrieve Previous Audit: The system first queries the database to find the most recent SiteAuditReport for the siteUrl being audited.
  2. Calculate Difference (Diff):

* If a previous audit exists, a detailed "before/after" comparison is performed between the newly generated audit report and the previous one.

* This diff highlights:

New Issues: Problems present in the current audit that were not* in the previous one.

Resolved Issues: Problems present in the previous audit that are no longer* present in the current one.

* Changed Metrics: Significant changes in Core Web Vitals scores, link counts, or other quantifiable metrics.

* Page Status Changes: Pages that moved from Fail to Pass, Pass to Warning, etc.

  1. Store New Report with Diff: The newly generated SiteAuditReport (including the calculated diffFromPreviousAudit) is then inserted as a new document into the SiteAuditReports collection. This ensures a complete, timestamped record of every audit.

Structure of diffFromPreviousAudit:

  • previousAuditId: Reference to the _id of the previous audit report.
  • overallStatusChange: { from: 'Pass', to: 'Warning' }
  • summaryChanges:

* seoHealthScoreChange: +5% or -2%

* newIssuesCount: 3

* resolvedIssuesCount: 2

  • pageLevelChanges: An array detailing changes on specific pages.

* pageUrl: URL of the page with changes.

* statusChange: { from: 'Pass', to: 'Fail' }

* issueChanges:

* newIssues: [Missing Meta Description, Poor LCP Score]

* resolvedIssues: [Duplicate H1]

* metricChanges: [LCP: from 2.5s to 3.8s]


4. Database Interaction Details

The hive_db → conditional_update operation specifically involves:

  • Collection: SiteAuditReports (or a similar designated collection within your MongoDB instance).
  • Operation: Primarily an insertOne operation for the new SiteAuditReport document.
  • Indexing: The siteUrl and auditTimestamp fields are indexed to facilitate efficient retrieval of historical reports and diff calculations.
  • Data Validation: Basic schema validation is applied to ensure the integrity and expected structure of the stored audit data.

5. Accessibility and Next Steps for the Customer

Upon completion of this step, your SEO audit data is fully processed and stored.

  • Accessing Reports: You can access these detailed SiteAuditReport documents directly via your PantheraHive dashboard, which will present the data in an easily digestible format, including visual trends and the "before/after" diffs.
  • Actionable Insights: The Gemini-generated fix suggestions are now available for immediate implementation, allowing your team to address critical SEO issues efficiently.
  • Ongoing Monitoring: The scheduled weekly audits will continue to update this database, providing continuous monitoring and enabling you to track the long-term impact of your SEO efforts.

This concludes the "Site SEO Auditor" workflow. Your site's SEO performance is now comprehensively audited, and actionable insights are stored and ready for your review.

site_seo_auditor.txt
Download source file
Copy all content
Full output as text
Download ZIP
IDE-ready project ZIP
Copy share link
Permanent URL for this run
Get Embed Code
Embed this result on any website
Print / Save PDF
Use browser print dialog
\n ```\n* **Content/Copy Suggestions**: Recommendations for improving content for SEO, such as keyword integration or readability (e.g., for meta descriptions, H1s, or general body text).\n* **Developer Instructions**: Clear, concise instructions on *where* and *how* to apply the fix within the existing codebase or CMS.\n\n### 5. Integration and Storage\n\nThe generated fixes are immediately integrated into the `SiteAuditReport` document stored in MongoDB. Each fix is associated with the specific audit issue it addresses.\n\n* **Before/After Diff**: The system maintains a \"before\" state (the problematic HTML/configuration) and captures the \"after\" state (the recommended fix from Gemini). This allows for clear tracking of changes and verification of implementation.\n* **Actionable Reporting**: The `SiteAuditReport` now contains not just a list of problems, but a comprehensive action plan, ready for review and implementation by your development or marketing team.\n\n### 6. Benefits to the Customer\n\nThis `gemini → batch_generate` step delivers significant value:\n\n* **Automated Problem Solving**: Moves beyond mere identification to provide direct solutions, saving valuable time and resources.\n* **Reduced Manual Effort**: Eliminates the need for manual diagnosis and research by SEO specialists or developers for common issues.\n* **Accelerated SEO Improvements**: Faster turnaround from audit to implementation, leading to quicker improvements in search engine rankings and visibility.\n* **Precision and Accuracy**: Gemini's advanced AI ensures fixes are aligned with current SEO best practices and technical standards.\n* **Developer-Ready Output**: Provides code snippets and clear instructions that can be directly handed over to development teams for implementation.\n* **Comprehensive Action Plan**: Transforms the audit report into a detailed roadmap for site optimization.\n\n---\n\nThis step ensures that your Site SEO Auditor isn't just a diagnostic tool, but a powerful, proactive engine for continuous SEO improvement, providing not just the \"what\" but the \"how\" for every identified opportunity.\n\n## Step 4 of 5: Data Persistence - Storing Your Site SEO Audit Report\n\nThis crucial step marks the successful persistence of your comprehensive SEO audit findings. Following the completion of the in-depth scan and analysis by our headless crawler, all generated insights, identified issues, and Gemini-powered fixes are now securely stored within your dedicated `hive_db` database. This ensures a persistent, historical record of your site's SEO health.\n\n---\n\n### Comprehensive SiteAuditReport Document Structure\n\nThe audit results are meticulously structured into a `SiteAuditReport` document, which is then upserted into the `SiteAuditReports` collection within your dedicated `hive_db` instance. This document captures every facet of the audit, enabling a holistic view of your site's SEO health.\n\nEach `SiteAuditReport` document includes, but is not limited to, the following key fields:\n\n* **`audit_id`**: A unique identifier for this specific audit run, facilitating easy retrieval and tracking.\n* **`site_url`**: The primary URL of the audited website.\n* **`audit_timestamp`**: The exact date and time when the audit was completed, crucial for historical analysis.\n* **`overall_score`**: An aggregate score reflecting the overall SEO health of your site based on the 12-point checklist.\n* **`page_reports`**: An array of detailed reports for *each individual page* visited by the crawler, containing:\n * `page_url`: The URL of the specific page.\n * `meta_title`: Content and uniqueness status (Pass/Fail).\n * `meta_description`: Content and uniqueness status (Pass/Fail).\n * `h1_presence`: Boolean indicating the presence of an H1 tag (Pass/Fail).\n * `image_alt_coverage`: Percentage of images with descriptive alt text (Pass/Fail).\n * `internal_link_density`: Number of internal links found on the page.\n * `canonical_tag`: Presence and correctness of the canonical URL (Pass/Fail).\n * `open_graph_tags`: Presence and validity of Open Graph tags for social sharing (Pass/Fail).\n * `core_web_vitals`: Detailed scores for Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and First Input Delay (FID) (Pass/Fail).\n * `structured_data_presence`: Boolean indicating the presence of schema markup (Pass/Fail).\n * `mobile_viewport`: Presence of the viewport meta tag for mobile responsiveness (Pass/Fail).\n * `status`: An aggregate status for the page (e.g., \"Good,\" \"Needs Improvement,\" \"Critical\").\n* **`identified_issues`**: A consolidated, site-wide list of all broken elements and SEO deficiencies discovered during the audit, categorized for clarity.\n* **`gemini_fixes`**: For each identified issue, a detailed, actionable fix generated by our Gemini AI, often including specific code snippets, configuration changes, or step-by-step instructions.\n* **`before_after_diff`**: A critical component that automatically compares the current audit's findings with the most recent previous audit for your site. This highlights:\n * **Improvements**: Metrics or issues that have been resolved or improved.\n * **Regressions**: Metrics or issues that have worsened since the last audit.\n * **New Issues**: Problems identified for the first time.\n This differential analysis provides immediate context on the impact of recent changes to your website.\n\n---\n\n### Secure and Efficient Data Upsert into MongoDB\n\nThe `upsert` operation is a highly efficient database command that intelligently handles the data persistence process within your `hive_db` (MongoDB) instance:\n\n* **Mechanism**:\n * **Insertion**: If a `SiteAuditReport` for the specific `audit_id` (or `site_url` and `audit_timestamp` combination) does *not* already exist in the `SiteAuditReports` collection, a new document is **inserted**. This typically occurs during the very first audit or a manual re-run that generates a new unique `audit_id`.\n * **Update Logic**: If a `SiteAuditReport` for the given `site_url` and `audit_timestamp` (or a similar identifying key for recurring audits) *already exists*, the existing document is **updated** with the latest comprehensive audit results. This is crucial for maintaining a single, up-to-date record for a specific audit run, especially if any post-processing or additional data enrichment occurs.\n* **Key for Upsert**: The primary key used for this operation is a combination of `site_url` and `audit_timestamp`. This ensures that each unique audit run for your site is correctly identified and either stored as a new record or updated appropriately.\n* **Database**: All data is securely stored within your dedicated `hive_db` instance, leveraging MongoDB's robust NoSQL capabilities for scalability and flexibility.\n* **Automation**: This process is automatically triggered immediately after every scheduled audit (every Sunday at 2 AM) or following any on-demand audit request, ensuring your SEO health data is always current and readily available.\n\n---\n\n### Empowering Your SEO Strategy with Persistent Data\n\nBy diligently storing your `SiteAuditReport` data in `hive_db`, we provide you with invaluable resources to continuously enhance and refine your SEO strategy:\n\n* **Historical Performance Tracking**: Effortlessly monitor your site's SEO health over extended periods, observing trends, identifying patterns, and understanding the long-term impact of implemented changes.\n* **Measurable Progress & ROI**: The `before_after_diff` component allows you to clearly visualize the improvements (or regressions) on specific metrics and issues, providing concrete, data-driven evidence of your SEO efforts' return on investment.\n* **Actionable Insights & Fixes Repository**: Access a searchable and organized repository of all identified issues and their corresponding Gemini-generated fixes. This empowers your team to prioritize, plan, and execute corrective actions with maximum efficiency.\n* **Comprehensive Reporting Foundation**: This meticulously stored data forms the bedrock for advanced reporting, dynamic dashboards, and custom analytics, offering deeper, more granular insights into your site's performance and areas for optimization.\n* **Compliance & Record Keeping**: Maintain a robust historical record of your site's SEO compliance and technical health, which can be vital for internal audits, regulatory requirements, or demonstrating due diligence.\n\n---\n\n### What Happens Next (Step 5 of 5)\n\nWith the `SiteAuditReport` successfully persisted in `hive_db`, the final step (Step 5) involves the critical distribution of these findings. This typically includes sending a comprehensive report to designated stakeholders and updating your centralized dashboard with the latest SEO health metrics, ensuring you and your team are immediately informed and ready to act on the audit results.\n\n## Step 5 of 5: Database Update & Report Generation (`hive_db → conditional_update`)\n\nThis final step in the \"Site SEO Auditor\" workflow is critical for persisting the comprehensive audit results, enabling historical tracking, and providing actionable insights through a \"before/after\" comparison. All the data gathered by the headless crawler, analyzed against the 12-point SEO checklist, and enhanced with Gemini's fix suggestions, is now securely stored in your dedicated MongoDB database.\n\n---\n\n### **1. Purpose of this Step: Data Persistence and Historical Tracking**\n\nThe `conditional_update` operation ensures that every SEO audit performed (either on-demand or via the weekly schedule) is meticulously recorded. This step serves to:\n\n* **Store Comprehensive Audit Reports:** Each audit run generates a complete `SiteAuditReport` document.\n* **Enable Trend Analysis:** By storing historical data, we can track your site's SEO performance over time, identifying improvements or regressions.\n* **Provide Actionable Diffs:** A \"before/after\" difference is calculated against the most recent previous audit, highlighting specific changes and their impact.\n* **Maintain Data Integrity:** Ensures all audit data, including the Gemini-generated fixes, is safely and reliably stored.\n\n---\n\n### **2. Data Stored: The `SiteAuditReport` Document**\n\nA new `SiteAuditReport` document is created in your MongoDB database for each audit run. This document is designed to be comprehensive and easily queryable.\n\n**Key Fields within a `SiteAuditReport`:**\n\n* **`_id`**: Unique identifier for the audit report.\n* **`siteUrl`**: The URL of the website that was audited.\n* **`auditTimestamp`**: Date and time when the audit was completed (ISO format).\n* **`triggerType`**: Indicates how the audit was initiated (`scheduled` or `on-demand`).\n* **`overallStatus`**: An aggregated status for the entire site (`Pass`, `Warning`, `Fail`) based on the severity of identified issues.\n* **`summary`**:\n * `totalPagesAudited`: Count of unique pages visited.\n * `pagesWithIssues`: Count of pages that failed one or more checks.\n * `seoHealthScore`: A calculated percentage score reflecting overall site SEO health.\n * `issuesByCategory`: Breakdown of issues by type (e.g., `metaTitleMissing`: 5, `imageAltMissing`: 12).\n* **`pageResults`**: An array of objects, each representing the audit findings for a single page.\n * `pageUrl`: The URL of the specific page.\n * `pageStatus`: (`Pass`, `Warning`, `Fail`) for this individual page.\n * **`seoChecks`**: An object containing the results for each of the 12 SEO checklist items:\n * `metaTitle`: `{ status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }`\n * `metaDescription`: `{ status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }`\n * `h1Presence`: `{ status: 'Pass'/'Fail', count: 1, issue: '...', fixSuggestion: '...' }`\n * `imageAltCoverage`: `{ status: 'Pass'/'Fail', coverage: '85%', issues: [...], fixSuggestion: '...' }`\n * `internalLinkDensity`: `{ status: 'Pass'/'Fail', count: 20, issue: '...', fixSuggestion: '...' }`\n * `canonicalTag`: `{ status: 'Pass'/'Fail', value: '...', issue: '...', fixSuggestion: '...' }`\n * `openGraphTags`: `{ status: 'Pass'/'Fail', tags: {...}, issue: '...', fixSuggestion: '...' }`\n * `coreWebVitals`: `{ status: 'Pass'/'Fail', lcp: '...', cls: '...', fid: '...', issue: '...', fixSuggestion: '...' }`\n * `structuredData`: `{ status: 'Pass'/'Fail', types: [...], issue: '...', fixSuggestion: '...' }`\n * `mobileViewport`: `{ status: 'Pass'/'Fail', issue: '...', fixSuggestion: '...' }`\n * *(Each `issue` field contains a description of the problem, and `fixSuggestion` holds the Gemini-generated actionable fix.)*\n * `brokenElements`: An array of specific issues identified on the page, including their Gemini-generated fixes.\n * `issueDescription`: \"Missing H1 tag\"\n * `geminiFix`: \"Add an H1 tag to the page, ensuring it's unique and semantically relevant to the page content. Example: `

Your Page Title Here

`\"\n* **`diffFromPreviousAudit`**: (Detailed in the next section)\n\n---\n\n### **3. Conditional Logic and Diff Calculation**\n\nBefore storing the new report, the system performs a `conditional_update` check:\n\n1. **Retrieve Previous Audit:** The system first queries the database to find the most recent `SiteAuditReport` for the `siteUrl` being audited.\n2. **Calculate Difference (Diff):**\n * If a previous audit exists, a detailed \"before/after\" comparison is performed between the newly generated audit report and the previous one.\n * This diff highlights:\n * **New Issues:** Problems present in the current audit that were *not* in the previous one.\n * **Resolved Issues:** Problems present in the previous audit that are *no longer* present in the current one.\n * **Changed Metrics:** Significant changes in Core Web Vitals scores, link counts, or other quantifiable metrics.\n * **Page Status Changes:** Pages that moved from `Fail` to `Pass`, `Pass` to `Warning`, etc.\n3. **Store New Report with Diff:** The newly generated `SiteAuditReport` (including the calculated `diffFromPreviousAudit`) is then inserted as a new document into the `SiteAuditReports` collection. This ensures a complete, timestamped record of every audit.\n\n**Structure of `diffFromPreviousAudit`:**\n\n* `previousAuditId`: Reference to the `_id` of the previous audit report.\n* `overallStatusChange`: `{ from: 'Pass', to: 'Warning' }`\n* `summaryChanges`:\n * `seoHealthScoreChange`: `+5%` or `-2%`\n * `newIssuesCount`: `3`\n * `resolvedIssuesCount`: `2`\n* `pageLevelChanges`: An array detailing changes on specific pages.\n * `pageUrl`: URL of the page with changes.\n * `statusChange`: `{ from: 'Pass', to: 'Fail' }`\n * `issueChanges`:\n * `newIssues`: [`Missing Meta Description`, `Poor LCP Score`]\n * `resolvedIssues`: [`Duplicate H1`]\n * `metricChanges`: [`LCP: from 2.5s to 3.8s`]\n\n---\n\n### **4. Database Interaction Details**\n\nThe `hive_db → conditional_update` operation specifically involves:\n\n* **Collection:** `SiteAuditReports` (or a similar designated collection within your MongoDB instance).\n* **Operation:** Primarily an `insertOne` operation for the new `SiteAuditReport` document.\n* **Indexing:** The `siteUrl` and `auditTimestamp` fields are indexed to facilitate efficient retrieval of historical reports and diff calculations.\n* **Data Validation:** Basic schema validation is applied to ensure the integrity and expected structure of the stored audit data.\n\n---\n\n### **5. Accessibility and Next Steps for the Customer**\n\nUpon completion of this step, your SEO audit data is fully processed and stored.\n\n* **Accessing Reports:** You can access these detailed `SiteAuditReport` documents directly via your PantheraHive dashboard, which will present the data in an easily digestible format, including visual trends and the \"before/after\" diffs.\n* **Actionable Insights:** The Gemini-generated fix suggestions are now available for immediate implementation, allowing your team to address critical SEO issues efficiently.\n* **Ongoing Monitoring:** The scheduled weekly audits will continue to update this database, providing continuous monitoring and enabling you to track the long-term impact of your SEO efforts.\n\nThis concludes the \"Site SEO Auditor\" workflow. Your site's SEO performance is now comprehensively audited, and actionable insights are stored and ready for your review.";function phTab(btn,name){document.querySelectorAll(".ph-panel").forEach(function(el){el.classList.remove("active");});document.querySelectorAll(".ph-tab").forEach(function(el){el.classList.remove("active");el.classList.add("inactive");});var p=document.getElementById("panel-"+name);if(p)p.classList.add("active");btn.classList.remove("inactive");btn.classList.add("active");if(name==="preview"){var fr=document.getElementById("ph-preview-frame");if(fr&&!fr.dataset.loaded){if(_phIsHtml){fr.srcdoc=_phCode;}else{var vc=document.getElementById("panel-content");fr.srcdoc=vc?""+vc.innerHTML+"":"

No content

";}fr.dataset.loaded="1";}}}function phCopyCode(){navigator.clipboard.writeText(_phCode).then(function(){var b=document.getElementById("tab-code");if(b){var o=b.innerHTML;b.innerHTML=' Copied!';setTimeout(function(){b.innerHTML=o;},2000);}});}function phCopyAll(){navigator.clipboard.writeText(_phAll).then(function(){alert("Content copied to clipboard!");});}function phDownload(){var content=_phCode||_phAll;if(!content){alert("No content to download.");return;}var fn=_phFname;if(!_phCode&&fn.endsWith(".txt"))fn=fn.replace(/\.txt$/,".md");var a=document.createElement("a");a.href="data:text/plain;charset=utf-8,"+encodeURIComponent(content);a.download=fn;a.click();}function phDownloadZip(){ var lbl=document.getElementById("ph-zip-lbl"); if(lbl)lbl.textContent="Preparing\u2026"; /* ===== HELPERS ===== */ function cc(s){ return s.replace(/[_\-\s]+([a-z])/g,function(m,c){return c.toUpperCase();}) .replace(/^[a-z]/,function(m){return m.toUpperCase();}); } function pkgName(app){ return app.toLowerCase().replace(/[^a-z0-9]+/g,"_").replace(/^_+|_+$/g,"")||"my_app"; } function slugTitle(app){ return app.replace(/_/g," "); } /* Generic code block extractor. Finds marker comments like: // lib/main.dart or # lib/main.dart or ## lib/main.dart and collects lines until the next marker. Also strips markdown fences (\`\`\`lang ... \`\`\`) from each block. */ function extractFiles(txt, pathRe){ var files={}, cur=null, buf=[]; function flush(){ if(cur&&buf.length){ files[cur]=buf.join("\n").trim(); } } txt.split("\n").forEach(function(line){ var m=line.trim().match(pathRe); if(m){ flush(); cur=m[1]; buf=[]; return; } if(cur) buf.push(line); }); flush(); // Strip \`\`\`...\`\`\` fences from each file Object.keys(files).forEach(function(k){ files[k]=files[k].replace(/^\`\`\`[a-z]*\n?/,"").replace(/\n?\`\`\`$/,"").trim(); }); return files; } /* General path extractor that covers most languages */ function extractCode(txt){ var re=/^(?:\/\/|#|##)\s*((?:lib|src|test|tests|Sources?|app|components?|screens?|views?|hooks?|routes?|store|services?|models?|pages?)\/[\w\/\-\.]+\.\w+|pubspec\.yaml|Package\.swift|angular\.json|babel\.config\.(?:js|ts)|vite\.config\.(?:js|ts)|tsconfig\.(?:json|app\.json)|app\.json|App\.(?:tsx|jsx|vue|kt|swift)|MainActivity(?:\.kt)?|ContentView\.swift)/i; return extractFiles(txt, re); } /* Detect language from combined code+panel text */ function detectLang(code, panel){ var t=(code+" "+panel).toLowerCase(); if(t.indexOf("import 'package:flutter")>=0||t.indexOf('import "package:flutter')>=0) return "flutter"; if(t.indexOf("statelesswidget")>=0||t.indexOf("statefulwidget")>=0) return "flutter"; if((t.indexOf(".dart")>=0)&&(t.indexOf("pubspec")>=0||t.indexOf("flutter:")>=0)) return "flutter"; if(t.indexOf("react-native")>=0||t.indexOf("react_native")>=0) return "react-native"; if(t.indexOf("stylesheet.create")>=0||t.indexOf("view, text, touchableopacity")>=0) return "react-native"; if(t.indexOf("expo(")>=0||t.indexOf("\"expo\":")>=0||t.indexOf("from 'expo")>=0) return "react-native"; if(t.indexOf("import swiftui")>=0||t.indexOf("import uikit")>=0) return "swift"; if(t.indexOf(".swift")>=0&&(t.indexOf("func body")>=0||t.indexOf("@main")>=0||t.indexOf("var body: some view")>=0)) return "swift"; if(t.indexOf("import android.")>=0||t.indexOf("package com.example")>=0) return "kotlin"; if(t.indexOf("@composable")>=0||t.indexOf("fun mainactivity")>=0||(t.indexOf(".kt")>=0&&t.indexOf("androidx")>=0)) return "kotlin"; if(t.indexOf("@ngmodule")>=0||t.indexOf("@component")>=0) return "angular"; if(t.indexOf("angular.json")>=0||t.indexOf("from '@angular")>=0) return "angular"; if(t.indexOf(".vue")>=0||t.indexOf("