Hook (1-2 sentences): Your support tool doesn't expose comments via API, your users can't log in to check ticket status, and the vendor emails are noise. So you scrape it yourself.
Why It Matters (2-4 sentences): Domo's support portal keeps comment history locked behind an authenticated UI — there's no dataset, no API, no export. If you're an admin managing 20-30 open tickets for users who don't have portal access, you're the bottleneck. Scraping the portal programmatically unlocks the ability to pipe that data into tools your team already uses — a project tracker, a Slack alert, a Domo dataset itself. Once you have the raw data, you can build anything on top of it.
What You'll Learn
- Authenticate a headless Selenium browser against a login-protected support portal
- Use the
fomo-library-extensionspip package to minimize scraping boilerplate to ~30 lines - Extract ticket data and comment history from a dynamically rendered page
- Structure the output as a dataset ready to load into Jira or another project management tool
- Understand where web scraping is appropriate and where it hammers infrastructure unnecessarily
Scraping Behind Auth: Patterns That Actually Work
The core challenge here isn't the scraping itself — it's authentication. Most tutorials show you how to scrape public pages; this one starts where those leave off: a username/password login screen that must be resolved before the driver can touch any data.
The approach uses Selenium's WebDriver as a headed or headless browser that navigates to the portal, waits for the DOM to render, then fills credentials and clicks through the login flow. The fomo-library-extensions library (available on PyPI) wraps enough of this ceremony that the working scraper lands at around 30 lines — not because the problem is trivial, but because the abstraction is well-scoped.
A few patterns worth noting from the implementation:
- Explicit waits over sleeps. Dynamic pages load content asynchronously. Hard-coded
sleep()calls break under load;WebDriverWaitwith expected conditions is more resilient. - Session reuse. Re-authenticating on every run is slow and suspicious-looking to rate limiters. Persisting cookies or session state between runs reduces overhead.
- Structured output first. The goal isn't raw HTML — it's a clean dataset with ticket ID, status, and comment thread. Designing the output schema before writing the parser keeps the scraper focused.
The downstream target here is Jira, where internal users already live. The scraped dataset bridges the gap between a vendor portal with limited access control and a team that needs visibility without requiring new logins or vendor cooperation. The same pattern applies anywhere you have a UI-only data source and a legitimate need to move that data somewhere else.
One honest caveat from the video: scraping does put load on the target server. Throttle your requests, run on a schedule rather than continuously, and check the terms of service for whatever you're targeting.


