Project

brightdata

0.0
The project is in a healthy, maintained state
A typed, ergonomic Ruby client for Bright Data's Datasets v3 APIs, with LinkedIn endpoints.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0
~> 1.86
~> 3.0
~> 0.9

Runtime

~> 2.6
 Project Readme

brightdata

A small, ergonomic Ruby client for Bright Data's Datasets v3 scraper APIs. Returns parsed results as immutable Data value objects with named readers. Version 0.1.0 ships the LinkedIn endpoints.

Installation

Add it to your Gemfile:

gem "brightdata"

Then run bundle install, or install it directly:

gem install brightdata

Requires Ruby 3.4.4 or newer.

Configuration

Create a client with your Bright Data API token:

client = BrightData::Client.new(api_token: ENV.fetch("BRIGHTDATA_API_TOKEN"))

Optional keyword arguments:

  • base_url: — override the API host (defaults to https://api.brightdata.com).
  • logger: — a Logger that receives one debug line per request.

Synchronous vs. asynchronous

Every endpoint exposes two methods:

  • #scrape(...) runs synchronously and returns parsed value-object results. Bright Data caps synchronous scrapes at 60 seconds; if a job exceeds that, scrape raises BrightData::ScrapeTimeoutError, which carries a resumable snapshot (error.snapshot).
  • #trigger(...) starts an asynchronous collection and returns a BrightData::Snapshot you poll with #wait.
# Synchronous
profiles = client.linkedin.profiles.scrape(
  urls: ["https://www.linkedin.com/in/example/"]
)
profiles.first.name # => "Example Person"

# Asynchronous
snapshot = client.linkedin.profiles.trigger(
  urls: ["https://www.linkedin.com/in/example/"]
)
result = snapshot.wait # blocks, polling progress until ready/failed/timeout
if result.success?
  result.payload # => Array<BrightData::LinkedIn::Types::Profile>
else
  result.error   # => raw failure payload from Bright Data
end

Snapshot#wait accepts timeout: (default 300s) and poll_interval: (default 5s), and raises BrightData::ScrapeTimeoutError if the deadline passes before the snapshot reaches a terminal state.

LinkedIn endpoints

Call Argument Returns
linkedin.profiles urls: Types::Profile
linkedin.companies urls: Types::Company
linkedin.jobs.collect_by_url urls: Types::Job
linkedin.jobs.discover_by_url urls: Types::Job
linkedin.jobs.discover_by_keyword queries: (Types::JobKeywordInput) Types::Job
linkedin.posts.collect_by_url urls: Types::Post
linkedin.posts.discover_by_url urls: Types::Post
linkedin.posts.discover_by_profile_url profile_urls: Types::Post
linkedin.posts.discover_by_company_url company_urls: Types::Post
linkedin.people.discover_new_profiles queries: (Types::PeopleDiscoverInput) Types::DiscoveredProfile

Discovery by keyword

query = BrightData::LinkedIn::Types::JobKeywordInput.new(
  location: "New York",
  keyword: "ruby",
  country: nil, time_range: nil, job_type: nil, experience_level: nil,
  remote: nil, company: nil, selective_search: nil,
  jobs_to_not_include: nil, location_radius: nil
)

jobs = client.linkedin.jobs.discover_by_keyword.scrape(queries: [query])

nil fields are omitted from the request payload.

Result types

Results are immutable Data value objects (Types::Profile, Types::Company, Types::Job, Types::Post, Types::DiscoveredProfile). Each exposes named readers for the common fields plus #raw, the full parsed response hash, so you can reach fields the gem does not yet model:

profile = profiles.first
profile.name        # named reader
profile.raw[:posts] # anything not yet modelled

The readers are not type-checked - Data.define gives you immutable structs with named fields, not static types. Treat them as a stable, documented shape for the common case, and reach for #raw when the API returns something the gem does not yet cover.

Error handling

All errors inherit from BrightData::Error:

  • BrightData::ConfigurationError — blank API token.
  • BrightData::ArgumentError — bad argument shape (note: not Ruby's ::ArgumentError).
  • BrightData::AuthError — 401/403 from the API.
  • BrightData::RateLimitError — 429; exposes #retry_after.
  • BrightData::ServerError — 5xx.
  • BrightData::HTTPError — other transport failures and timeouts.
  • BrightData::ScrapeTimeoutError — synchronous scrape exceeded the 60s cap; recover via error.snapshot.wait.
begin
  client.linkedin.profiles.scrape(urls: urls)
rescue BrightData::ScrapeTimeoutError => e
  e.snapshot.wait # fall back to async polling
rescue BrightData::RateLimitError => e
  sleep(e.retry_after || 5)
  retry
rescue BrightData::Error => e
  warn "Bright Data request failed: #{e.message}"
end

Documentation for AI agents

llm.md is a single-file, LLM-friendly reference generated from the gem's YARD documentation. Point a coding assistant at it for the full API surface and usage examples. Regenerate it with bin/prepare_release (or bundle exec yardoc --format=markdown && bin/generate_llm.rb).

License

Released under the MIT License.