Workflow Description: A headless crawler visits every page on your site and audits it against a 12-point SEO checklist. Broken elements get sent to Gemini, which generates the exact fix.
This document details the output from the "gemini → batch_generate" step, where our AI-powered engine analyzes identified SEO issues and provides precise, actionable solutions.
Following the comprehensive site crawl and audit, our system has identified specific SEO discrepancies and areas for improvement across your website. In this crucial step, these identified "broken elements" are systematically fed into our advanced Gemini AI model. Gemini then leverages its deep understanding of web standards, SEO best practices, and code generation capabilities to produce exact, ready-to-implement fixes.
The goal of this phase is to move beyond mere identification of problems to providing concrete, developer-ready solutions, significantly streamlining your optimization efforts.
Below are examples of the detailed, actionable fixes generated by Gemini for various common SEO issues. Each example includes the identified problem, Gemini's proposed solution (often with code snippets), and an explanation of the fix.
<h1> heading, which is crucial for conveying the main topic of the page to search engines and users.https://www.yourwebsite.com/blog/latest-article-title <!-- Original hero image tag (before fix): -->
<!-- <img src="/images/hero-banner-full.jpg" alt="Homepage hero banner"> -->
<!-- Gemini's generated fix for LCP optimization: -->
<img
src="/images/hero-banner-full.webp"
alt="Homepage hero banner"
loading="eager"
fetchpriority="high"
width="1920"
height="1080"
srcset="/images/hero-banner-full-small.webp 600w, /images/hero-banner-full-medium.webp 1200w, /images/hero-banner-full.webp 1920w"
sizes="(max-width: 600px) 600px, (max-width: 1200px) 1200px, 1920px"
>
<link rel="preload" as="image" href="/images/hero-banner-full.webp">
This document details the execution of the first crucial step in your Site SEO Auditor workflow: the comprehensive crawl of your website using Puppeteer. This phase is designed to simulate a real user's browser experience, ensuring an accurate and thorough collection of all publicly accessible pages and their associated raw data, forming the foundation for subsequent SEO analysis.
The primary objective of this step is to systematically visit and render every discoverable page on your specified website, collecting essential raw data that will be processed in the subsequent SEO audit steps. This process ensures that the audit is based on the actual content and structure experienced by users and search engine crawlers.
Our headless browser, powered by Puppeteer, executes a meticulous crawl using the following methodology:
https://www.yourwebsite.com/).<a> (anchor) links within the rendered HTML of each visited page. This ensures comprehensive discovery of all interconnected pages.robots.txt: The crawler adheres strictly to the directives specified in your robots.txt file, ensuring that no pages or sections explicitly disallowed for crawling are accessed. This respects your site's crawl policies.For each successfully crawled page, the following critical data points are extracted:
* Time to First Byte (TTFB): Measures the responsiveness of your web server.
* DOM Content Loaded: Indicates when the initial HTML document has been completely loaded and parsed.
* Load Event Fired: Signifies when the page and all its dependent resources (stylesheets, images, etc.) have finished loading.
Upon completion of the crawl, the system will have generated a comprehensive dataset containing the raw information for every discovered page. This output is then passed to the next stage of the workflow for in-depth SEO analysis.
* url: The absolute URL of the page.
* status_code: The HTTP status code (e.g., 200, 301, 404).
* html_content: The full HTML source of the rendered page.
* initial_load_metrics: An object containing TTFB, DOMContentLoaded, and Load Event times.
* discovered_internal_links: An array of new internal URLs found on this page.
* timestamp: The time the page was crawled.
This raw data is the essential input for the subsequent SEO audit steps, providing the unadulterated content against which all 12 SEO checklist items will be evaluated.
The collected raw page data will now be fed into Step 2: SEO Data Extraction & Analysis (Gemini). In this next phase, the HTML content of each page will be parsed, and a dedicated AI model (Gemini) will meticulously extract and analyze the specific SEO elements (meta tags, H1s, alt text, etc.) required for your comprehensive audit.
This document details the completion of Step 2: hive_db → diff for your Site SEO Auditor workflow. This crucial step involves generating a comprehensive "before-and-after" difference report by comparing your latest SEO audit results with the previously stored audit data in our hive_db (MongoDB).
hive_db → diffPurpose: The primary goal of this step is to provide a clear, actionable comparison between your site's current SEO performance and its previous state. By identifying changes, improvements, and regressions, we enable a proactive approach to SEO maintenance and optimization.
Mechanism:
SiteAuditReport from MongoDB, which was generated in the previous step by the headless crawler.SiteAuditReport for your domain from MongoDB.The generated diff report provides a granular comparison across all pages and the 12-point SEO checklist, highlighting:
* Example: Page /blog/new-post now has a duplicate meta title.
* Example: Page /products/item-x previously had missing image alt tags, which are now all present.
* Core Web Vitals: Pages where LCP, CLS, or FID scores have improved or deteriorated (e.g., LCP for /homepage went from 2.8s to 3.5s).
* Internal Link Density: Pages where the number of internal links significantly increased or decreased.
* Meta Title/Description Uniqueness: Identification of new duplicate titles/descriptions or resolution of existing ones.
* H1 Presence: Pages where an H1 was added or removed.
* Image Alt Coverage: Pages with newly missing alt attributes or pages where alt attributes were added.
* Canonical Tags: Pages with newly missing, incorrect, or newly correctly implemented canonical tags.
* Open Graph Tags: Pages with new issues or fixes related to Open Graph metadata.
* Structured Data Presence: Pages where structured data was added or removed.
* Mobile Viewport: Pages that gained or lost proper mobile viewport configuration.
This comprehensive diff report provides immediate actionable insights:
The output of this step is crucial for the subsequent steps in the workflow:
## Site SEO Audit Diff Report: [Your Domain] - [Current Date] vs. [Previous Audit Date]
### Overall Summary
* **Total Issues Detected (Current Audit):** 85
* **Total Issues Detected (Previous Audit):** 92
* **Net Change in Issues:** -7 (Improvement)
* **New Issues Introduced:** 3
* **Issues Resolved:** 10
* **Metrics Improved:** 5 pages (e.g., LCP)
* **Metrics Regressed:** 1 page (e.g., CLS)
### New Issues Detected (Current Audit Only)
* **Page: `/new-product-launch`**
* **Issue:** Missing H1 Tag
* **Issue:** Missing Open Graph `og:image`
* **Page: `/blog/latest-news`**
* **Issue:** Duplicate Meta Title (identical to `/blog/archive`)
### Issues Resolved (Fixed Since Last Audit)
* **Page: `/about-us`**
* **Resolved:** All Images Now Have Alt Attributes
* **Resolved:** Correct Canonical Tag Implemented
* **Page: `/contact`**
* **Resolved:** LCP Score Improved from 3.8s to 2.1s
* **Page: `/privacy-policy`**
* **Resolved:** Structured Data (Organization Schema) Added
### Metric Changes
* **Core Web Vitals - Improvements:**
* `/homepage`: LCP improved from 2.5s to 1.8s
* `/category/electronics`: CLS improved from 0.15 to 0.08
* **Core Web Vitals - Regressions:**
* `/product/item-z`: FID regressed from 50ms to 120ms (Needs Investigation)
* **Internal Link Density:**
* `/resources`: Internal links increased from 15 to 28 (Positive Change)
* `/old-blog-post`: Internal links decreased from 10 to 3 (Potential Issue)
### Detailed Page-Level Changes
**Page: `/products/featured-item`**
* **Previous Status:** H1 Present, All Images Alted, LCP: 2.2s
* **Current Status:** H1 Present, **1 Image Alt Missing**, LCP: 2.2s
* **Change:** New Issue - Missing Alt Tag for image `product-image-id-xyz.jpg`
**Page: `/blog/guide-to-seo`**
* **Previous Status:** Missing Canonical Tag, No Structured Data
* **Current Status:** Canonical Tag `https://yourdomain.com/blog/guide-to-seo` Present, Article Schema Structured Data Present
* **Change:** Resolved Issue - Canonical Tag Implemented. New Feature - Structured Data Added.
---
This detailed diff report ensures that you are always informed about the dynamic changes in your site's SEO landscape, enabling timely interventions and continuous improvement.
* Image Format: Suggests converting to .webp for better compression.
* loading="eager" & fetchpriority="high": Explicitly tells the browser to load this critical image immediately and with high priority.
* width & height: Provides intrinsic dimensions to prevent layout shifts.
* srcset & sizes: Implements responsive images, serving appropriately sized images for different viewport widths, reducing unnecessary data transfer.
* preload: Instructs the browser to fetch the image even earlier in the rendering process.
These combined efforts significantly improve the LCP metric, enhancing perceived page load speed.
The output from this "gemini → batch_generate" step (all identified issues with their corresponding Gemini-generated fixes) will now be:
SiteAuditReport document for your site, providing a comprehensive record of issues and their proposed solutions.This document details the successful execution of Step 4: hive_db → upsert within your "Site SEO Auditor" workflow. This crucial step ensures that all collected audit data, including detailed findings, AI-generated fixes, and historical comparisons, are securely and persistently stored in your dedicated database.
The hive_db → upsert operation is responsible for taking the comprehensive SiteAuditReport generated by the headless crawler and AI analysis, and storing it within your PantheraHive database. "Upsert" intelligently handles data storage:
This mechanism is vital for maintaining a complete, evolving history of your site's SEO performance.
SiteAuditReport StructureThe following detailed structure represents the SiteAuditReport document that has been successfully upserted into your MongoDB instance. Each field is designed to provide actionable insights and track changes over time.
{
"_id": "65e0a7b2c3d4e5f6a7b8c9d0", // Unique identifier for this audit report
"siteUrl": "https://www.yourwebsite.com", // The root URL of the audited site
"auditTimestamp": "2024-02-29T02:00:00.000Z", // UTC timestamp of when the audit was completed
"auditType": "scheduled", // "scheduled" or "on-demand"
"pagesAuditedCount": 150, // Total number of unique pages crawled and audited
"overallStatus": "Needs Improvement", // Overall status: "Pass", "Fail", "Needs Improvement"
"auditSummary": {
"totalIssuesFound": 35,
"criticalIssuesFound": 7,
"pagesWithIssues": 28,
"pagesWithGeminiFixes": 15
},
"pageReports": [ // Array of detailed reports for each audited page
{
"pageUrl": "https://www.yourwebsite.com/product/example-product",
"issueCount": 3,
"criticalIssueCount": 1,
"seoMetrics": {
"metaTitle": {
"status": "fail",
"value": "Example Product",
"issueDetails": "Meta title is too short (14 chars). Recommended: 50-60 chars.",
"geminiFix": "Consider updating the meta title to: 'Example Product Name - Buy Online | Your Brand' (55 chars)."
},
"metaDescription": {
"status": "pass",
"value": "Discover our amazing example product...",
"issueDetails": null,
"geminiFix": null
},
"h1Presence": {
"status": "pass",
"value": "Example Product Name",
"issueDetails": null,
"geminiFix": null
},
"imageAltCoverage": {
"status": "fail",
"details": [
{
"src": "/images/product-hero.jpg",
"alt": "",
"issue": "Missing alt attribute for critical image."
},
{
"src": "/images/logo.png",
"alt": "Your Brand Logo",
"issue": null
}
],
"geminiFix": "Add descriptive alt text to '/images/product-hero.jpg', e.g., 'Close-up of Example Product in blue'."
},
"internalLinkDensity": {
"status": "pass",
"count": 12,
"issueDetails": null
},
"canonicalTag": {
"status": "pass",
"value": "https://www.yourwebsite.com/product/example-product",
"issueDetails": null,
"geminiFix": null
},
"openGraphTags": {
"status": "fail",
"details": {
"og:title": "Example Product",
"og:description": "Discover our amazing example product...",
"og:image": null // Missing
},
"issueDetails": "Missing `og:image` tag, which can impact social sharing previews.",
"geminiFix": "Add an `og:image` tag pointing to a high-quality image (e.g., 'https://www.yourwebsite.com/images/og-product.jpg')."
},
"coreWebVitals": {
"LCP": { "value": 3.2, "status": "fail" }, // Largest Contentful Paint (seconds)
"CLS": { "value": 0.15, "status": "fail" }, // Cumulative Layout Shift
"FID": { "value": 55, "status": "pass" }, // First Input Delay (milliseconds)
"issueDetails": "LCP and CLS values are above recommended thresholds, indicating poor page load performance and layout instability.",
"geminiFix": "Optimize image sizes and consider lazy loading for above-the-fold content to improve LCP. Investigate layout shifts caused by dynamic content loading to reduce CLS."
},
"structuredDataPresence": {
"status": "pass",
"typesFound": ["Product", "BreadcrumbList"],
"issueDetails": null,
"geminiFix": null
},
"mobileViewport": {
"status": "pass",
"value": "<meta name='viewport' content='width=device-width, initial-scale=1'>",
"issueDetails": null,
"geminiFix": null
}
},
"geminiFixesGenerated": [ // List of specific fixes generated by Gemini for this page
"Update meta title for better length and keywords.",
"Add alt text to product hero image.",
"Implement og:image for improved social sharing.",
"Address LCP and CLS issues for better page performance."
]
}
// ... more page reports
],
"diffReport": { // Detailed comparison with the previous audit report
"previousAuditId": "65dff6a1b2c3d4e5f6a7b8c9", // Reference to the previous audit document
"newIssues": [ // Issues found in THIS audit that were NOT present in the previous one
{
"pageUrl": "https://www.yourwebsite.com/new-blog-post",
"metric": "h1Presence",
"issue": "Missing H1 tag."
}
],
"resolvedIssues": [ // Issues present in the PREVIOUS audit that are NO LONGER present
{
"pageUrl": "https://www.yourwebsite.com/old-product",
"metric": "metaDescription",
"issue": "Duplicate meta description."
}
],
"metricChanges": [ // Significant changes in key metrics (e.g., CWV improvements/degradations)
{
"pageUrl": "https://www.yourwebsite.com/homepage",
"metric": "coreWebVitals.LCP",
"oldValue": 4.1,
"newValue": 2.8,
"change": "Improved"
}
]
},
"reportGeneratedBy": "PantheraHive Site SEO Auditor"
}
SiteAuditReport, you gain a clear timeline of your site's SEO performance, allowing you to track improvements or regressions over time.diffReport provides immediate insights into what has changed since the last audit, highlighting new issues or confirming the resolution of previous ones.geminiFix, offering concrete, AI-generated recommendations directly within your audit report.The SiteAuditReport for https://www.yourwebsite.com (or the siteUrl specified in your input) generated from the latest audit run has been successfully upserted into the hive_db. This means:
_id for the new or updated document is 65e0a7b2c3d4e5f6a7b8c9d0 (example ID).You can now access this comprehensive report through the PantheraHive UI or directly query your database for detailed insights and actionable fixes.
With the data successfully stored, the final step involves presenting this information to you in an easily digestible and actionable format. This typically includes:
hive_db → conditional_update - Site SEO Auditor Report Storage and DiffingThis final step in the "Site SEO Auditor" workflow is responsible for persistently storing the comprehensive SEO audit report within your dedicated MongoDB instance (hive_db) and executing a conditional update. This ensures that all audit results, including detailed findings, AI-generated fixes, and a crucial before-and-after comparison, are securely saved and accessible for review and tracking.
The conditional_update operation serves several critical functions:
SiteAuditReport document in MongoDB.SiteAuditReport Document StructureThe audit results are stored as a SiteAuditReport document in a dedicated site_audit_reports collection within your hive_db. The document structure is designed to be comprehensive and facilitate easy querying and analysis.
{
"_id": ObjectId("..."),
"auditId": "uuid-for-this-audit-run",
"siteUrl": "https://www.example.com",
"auditTimestamp": ISODate("2023-10-27T02:00:00.000Z"),
"status": "completed", // or "failed"
"overallScore": 85, // Aggregate SEO score (0-100)
"overallSummary": "Good overall SEO health with minor improvements needed for image alt attributes and Core Web Vitals on specific pages.",
"pagesAudited": [
{
"pageUrl": "https://www.example.com/",
"pageScore": 90,
"metaTitle": {
"present": true,
"unique": true,
"length": 65,
"issue": null,
"fix": null
},
"metaDescription": {
"present": true,
"unique": true,
"length": 155,
"issue": null,
"fix": null
},
"h1Presence": {
"present": true,
"count": 1,
"issue": null,
"fix": null
},
"imageAltCoverage": {
"coveredPercentage": 80,
"totalImages": 10,
"imagesMissingAlt": 2,
"issue": "2 images missing alt text.",
"fix": "Add descriptive alt text to images with src: /img/logo.png, /img/banner.jpg."
},
"internalLinkDensity": {
"count": 25,
"issue": null,
"fix": null
},
"canonicalTags": {
"present": true,
"correct": true,
"issue": null,
"fix": null
},
"openGraphTags": {
"present": true,
"correct": true,
"issue": null,
"fix": null
},
"coreWebVitals": {
"LCP": 2.1, // seconds
"CLS": 0.05, // score
"FID": 50 // ms
},
"structuredDataPresence": {
"present": true,
"types": ["Organization", "WebPage"],
"issue": null,
"fix": null
},
"mobileViewport": {
"configured": true,
"issue": null,
"fix": null
},
"brokenElements": [
{
"elementSelector": "img[src='/img/banner.jpg']",
"issueDescription": "Image missing 'alt' attribute.",
"geminiFix": "Add `alt=\"Descriptive text for banner image\"` to the `<img>` tag."
}
]
}
// ... additional pages
],
"previousAuditId": "uuid-of-last-completed-audit", // Reference to the previous audit
"diffReport": {
"overallChanges": "Overall score decreased by 5 points. New issues detected on /blog page.",
"pageLevelChanges": [
{
"pageUrl": "https://www.example.com/blog",
"metricsChanged": ["metaTitle_length_increased", "h1Presence_missing_new"],
"detailedDiff": {
"metaTitle": { "old": 50, "new": 70 },
"h1Presence": { "old": true, "new": false, "issue_new": "H1 tag is now missing." }
}
}
// ... more page-level changes
]
},
"nextScheduledRun": ISODate("2023-11-03T02:00:00.000Z") // Next automated run timestamp
}
The conditional_update process follows these steps:
site_audit_reports collection to find the most recent completed audit for the siteUrl. This is crucial for generating the diffReport. * Condition: {"siteUrl": "YOUR_SITE_URL", "status": "completed"}
* Sort: {"auditTimestamp": -1} (to get the latest)
* If a previousAuditId is found: A detailed comparison is performed between the current audit's pagesAudited array and the pagesAudited array of the previous audit.
* Metrics Compared: Uniqueness, presence, length (for meta tags), counts (for H1, links), percentages (for alt coverage), and Core Web Vitals scores.
* Change Detection: Identifies improvements, regressions, and new issues.
* diffReport Population: The diffReport field in the new SiteAuditReport document is populated with a summary of overall changes and specific page-level changes, including detailed metric differences.
* If no previous audit is found (first run), the diffReport will indicate "No previous audit found for comparison."
SiteAuditReport document is assembled, including all audit findings, the previousAuditId (if applicable), and the newly generated diffReport. The status is set to completed. * The system performs an insert operation for the new SiteAuditReport document.
* Conditional Aspect: If, for some reason, an audit with the exact auditId already exists (e.g., due to a retry mechanism), the system can be configured to either fail gracefully or perform an upsert to update the existing document, ensuring idempotency. For this workflow, a new document is generally inserted for each unique audit run.
nextScheduledRun field is calculated based on the current auditTimestamp plus 7 days, ensuring the weekly automation.site_audit_reports * find(): To retrieve the latest previous audit.
* insertOne(): To store the new SiteAuditReport document.
* It is highly recommended to have indexes on siteUrl, auditTimestamp, and status to optimize query performance for retrieving previous audits and general reporting.
* db.site_audit_reports.createIndex({"siteUrl": 1, "auditTimestamp": -1})
* db.site_audit_reports.createIndex({"status": 1})
Robust error handling is implemented for this step:
hive_db cannot be established, the step will fail, and an alert will be triggered.insertOne operation (e.g., network issues, permission errors) will result in a step failure.diffReport might indicate an error, but the core audit data will still be saved.Upon successful completion of this step, the following actions and deliverables are available:
SiteAuditReport document is now stored in your hive_db.* PantheraHive UI: A dedicated "Site SEO Reports" section will display a list of all audits for your sites, allowing you to view each report, its summary, and the before/after diff.
* Direct Database Access: You can query the site_audit_reports collection in MongoDB to retrieve raw audit data.
* PantheraHive API: An API endpoint will be available to programmatically fetch audit reports.
nextScheduledRun timestamp.This step ensures that your SEO performance is consistently monitored, documented, and actionable insights are readily available through comprehensive reporting and historical diffs.
\n