brightdata
A small, ergonomic Ruby client for Bright Data's
Datasets v3 scraper APIs. Returns parsed results as immutable Data value
objects with named readers. Version 0.1.0 ships the LinkedIn endpoints.
Installation
Add it to your Gemfile:
gem "brightdata"Then run bundle install, or install it directly:
gem install brightdataRequires Ruby 3.4.4 or newer.
Configuration
Create a client with your Bright Data API token:
client = BrightData::Client.new(api_token: ENV.fetch("BRIGHTDATA_API_TOKEN"))Optional keyword arguments:
-
base_url:— override the API host (defaults tohttps://api.brightdata.com). -
logger:— aLoggerthat receives onedebugline per request.
Synchronous vs. asynchronous
Every endpoint exposes two methods:
-
#scrape(...)runs synchronously and returns parsed value-object results. Bright Data caps synchronous scrapes at 60 seconds; if a job exceeds that,scraperaisesBrightData::ScrapeTimeoutError, which carries a resumable snapshot (error.snapshot). -
#trigger(...)starts an asynchronous collection and returns aBrightData::Snapshotyou poll with#wait.
# Synchronous
profiles = client.linkedin.profiles.scrape(
urls: ["https://www.linkedin.com/in/example/"]
)
profiles.first.name # => "Example Person"
# Asynchronous
snapshot = client.linkedin.profiles.trigger(
urls: ["https://www.linkedin.com/in/example/"]
)
result = snapshot.wait # blocks, polling progress until ready/failed/timeout
if result.success?
result.payload # => Array<BrightData::LinkedIn::Types::Profile>
else
result.error # => raw failure payload from Bright Data
endSnapshot#wait accepts timeout: (default 300s) and poll_interval:
(default 5s), and raises BrightData::ScrapeTimeoutError if the deadline
passes before the snapshot reaches a terminal state.
LinkedIn endpoints
| Call | Argument | Returns |
|---|---|---|
linkedin.profiles |
urls: |
Types::Profile |
linkedin.companies |
urls: |
Types::Company |
linkedin.jobs.collect_by_url |
urls: |
Types::Job |
linkedin.jobs.discover_by_url |
urls: |
Types::Job |
linkedin.jobs.discover_by_keyword |
queries: (Types::JobKeywordInput) |
Types::Job |
linkedin.posts.collect_by_url |
urls: |
Types::Post |
linkedin.posts.discover_by_url |
urls: |
Types::Post |
linkedin.posts.discover_by_profile_url |
profile_urls: |
Types::Post |
linkedin.posts.discover_by_company_url |
company_urls: |
Types::Post |
linkedin.people.discover_new_profiles |
queries: (Types::PeopleDiscoverInput) |
Types::DiscoveredProfile |
Discovery by keyword
query = BrightData::LinkedIn::Types::JobKeywordInput.new(
location: "New York",
keyword: "ruby",
country: nil, time_range: nil, job_type: nil, experience_level: nil,
remote: nil, company: nil, selective_search: nil,
jobs_to_not_include: nil, location_radius: nil
)
jobs = client.linkedin.jobs.discover_by_keyword.scrape(queries: [query])nil fields are omitted from the request payload.
Result types
Results are immutable Data value objects (Types::Profile, Types::Company,
Types::Job, Types::Post, Types::DiscoveredProfile). Each exposes named
readers for the common fields plus #raw, the full parsed response hash, so you
can reach fields the gem does not yet model:
profile = profiles.first
profile.name # named reader
profile.raw[:posts] # anything not yet modelledThe readers are not type-checked - Data.define gives you immutable structs
with named fields, not static types. Treat them as a stable, documented shape
for the common case, and reach for #raw when the API returns something the
gem does not yet cover.
Error handling
All errors inherit from BrightData::Error:
-
BrightData::ConfigurationError— blank API token. -
BrightData::ArgumentError— bad argument shape (note: not Ruby's::ArgumentError). -
BrightData::AuthError— 401/403 from the API. -
BrightData::RateLimitError— 429; exposes#retry_after. -
BrightData::ServerError— 5xx. -
BrightData::HTTPError— other transport failures and timeouts. -
BrightData::ScrapeTimeoutError— synchronous scrape exceeded the 60s cap; recover viaerror.snapshot.wait.
begin
client.linkedin.profiles.scrape(urls: urls)
rescue BrightData::ScrapeTimeoutError => e
e.snapshot.wait # fall back to async polling
rescue BrightData::RateLimitError => e
sleep(e.retry_after || 5)
retry
rescue BrightData::Error => e
warn "Bright Data request failed: #{e.message}"
endDocumentation for AI agents
llm.md is a single-file, LLM-friendly reference generated from the
gem's YARD documentation. Point a coding assistant at it for the full API
surface and usage examples. Regenerate it with bin/prepare_release (or
bundle exec yardoc --format=markdown && bin/generate_llm.rb).
License
Released under the MIT License.