Website Content Scraper for ARN Sites

A web scraping tool built for CSS extraction and optimization.

Streamlining CSS Cleanup

As a frontend developer, I love tackling challenges that fuse creativity with technical ingenuity—especially when it comes to cleaning up and modernizing legacy code. Recently, I was entrusted with refactoring one of our flagship websites, only to discover that its CSS had spiraled into chaos. Multiple files contained redundant or overlapping styles, and there were no clear coding standards to tie them all together. This made it increasingly difficult to maintain consistency, and even small changes risked unexpected side effects throughout the site. From a design perspective, it was nearly impossible to ensure brand fidelity across pages, and from a development standpoint, it slowed innovation as developers spent more time searching through tangled CSS than creating new features. By recognizing these pitfalls, I set out to clean up the CSS to ultimately reduce file bloat, improve site performance, and establish best practices for everyone on the team.

The Challenge: Dualistic Frontend Environment

Smalltalk + “Skins” = Complexity

While standard CSS cleanup tools like PostCSS and its ecosystem of plugins offer powerful optimization capabilities, our unique frontend architecture presented additional challenges. The markup was generated server-side using Smalltalk/Seaside, making it difficult to directly access and analyze the HTML structure. This setup was further complicated by our "skins" system - separate collections of HTML partials, CSS, and JavaScript files that could be swapped out. Without easy access to the complete DOM and with minimal brand documentation, each skin developed its own interpretation of common elements, leading to rampant style duplication and inconsistency.

The Solution: A Custom CSS Cleanup Tool

My response was to develop Clean-CSS, a Node.js and Puppeteer-based tool that partially automates CSS extraction and cleanup. While not intended to be a full-fledged CSS optimization tool, it gives developers a faster way to identify redundancies and trim excess code—without spending hours going line-by-line to determine what to keep and what to discard.

  1. Automated Login and Session Handling

    • Effortlessly logs into the reservation funnel to access protected content.
  2. Full Funnel Traversal

    • Navigates each step of the reservation process, collecting relevant HTML and CSS artifacts.
  3. Extraction of Static Pages

    • Customizable list of static pages ensures a comprehensive snapshot of site-wide styles.
  4. Advanced CSS Processing

    • Unused Style Removal: Scrubs out irrelevant classes and rules.
    • Media Query Consolidation: Merges breakpoints for tidier CSS.
    • Duplicate Style Removal: De-duplicates overlapping rules.
    • Rule Merging: Combines similar selectors to shrink CSS footprints.
    • Asset URL Normalization: Standardizes paths for more predictable asset management.
  5. Flexible JSON Configuration

    • Allows fine-grained control over the scraping process, making the tool easily adaptable to various site configurations.
  6. Error Tracking and Retry

    • Robust logic ensures uninterrupted processing and simpler troubleshooting.

Results: Reduced Redundancy and Smoother Development

  • Time Savings: With repeated CSS patterns identified and removed, devs spend far less time hunting and merging styles, resulting in quicker feature rollouts.
  • Consistent Codebase: Even with minimal brand guidelines, using Clean-CSS ensures that what’s left is coherent and aligned across the entire site.
  • Stable Look and Feel: Lower risk of small tweaks reverberating across the site, preserving the site’s brand aesthetic more reliably.
  • Developer Confidence: Cleaner code leads to better morale and fosters bolder improvements, rather than timid modifications that might break hidden styles.

Technical Skills on Display

  • Web Scraping & Automation: Employed Node.js, Puppeteer, and custom scripts to automate login flows and gather HTML/CSS data.
  • CSS Optimization: Leveraged PostCSS and specialized plugins for deduplication, minification, and modular style management.
  • Self-Directed Refactoring: With no direct collaboration from designers or other developers, I shaped the cleanup strategy based on user experience needs and minimal brand references.
  • Process Standardization: Established straightforward “skin” guidelines to ensure future CSS additions remain organized and consistent.

Looking Ahead: Accelerating CSS Development

Clean-CSS transforms how developers approach CSS maintenance and cleanup. With this tool, teams can:

  • Slash Cleanup Time: What used to take days of manual CSS review can now be accomplished in hours through automated analysis and optimization.
  • Confidently Update Styles: Built-in safeguards prevent style conflicts and maintain visual consistency across the site.
  • Scale Without Complexity: As the codebase grows, Clean-CSS automatically identifies and removes redundant styles, keeping CSS lean and manageable.
  • Standardize Development: The included style guidelines ensure all developers follow consistent patterns, preventing future CSS bloat.

The tool continues to evolve, with planned improvements focused on faster processing, expanded site coverage, and even more intelligent style optimization. For teams wrestling with complex CSS architectures, Clean-CSS offers a clear path to simpler, more maintainable stylesheets.

Final Thoughts

This CSS scraping and optimization tool marks an important step forward in managing complex stylesheets. Through intelligent automation and careful analysis, it empowers developers to confidently maintain and improve their CSS codebase while ensuring visual consistency across the entire site.