Find links on a page

Usage

ragnar_find_links(
  x,
  depth = 0L,
  children_only = TRUE,
  progress = TRUE,
  ...,
  url_filter = identity
)

Arguments

x: URL, HTML file path, or XML document. For Markdown, convert to HTML using commonmark::markdown_html() first.
depth: Integer specifying how many levels deep to crawl for links. When depth > 0, the function will follow child links (links with x as a prefix) and collect links from those pages as well.
children_only: Logical or string. If TRUE, returns only child links (those having x as a prefix). If FALSE, returns all links found on the page. Note that regardless of this setting, only child links are followed when depth > 0.
progress: Logical, draw a progress bar if depth > 0. A separate progress bar is drawn per recursion level.
...: Currently unused. Must be empty.
url_filter: A function that takes a character vector of URL's and may subset them to return a smaller list. This can be useful for filtering out URL's by rules different them children_only which only checks the prefix.

Value

A character vector of links on the page.

Examples

if (FALSE) { # \dontrun{
ragnar_find_links("https://r4ds.hadley.nz/base-R.html")
ragnar_find_links("https://ellmer.tidyverse.org/")
ragnar_find_links("https://ellmer.tidyverse.org/", depth = 2)
ragnar_find_links("https://ellmer.tidyverse.org/", depth = 2, children_only = FALSE)
ragnar_find_links(
  paste0("https://github.com/Snowflake-Labs/sfquickstarts/",
         "tree/master/site/sfguides/src/build_a_custom_model_for_anomaly_detection"),
  children_only = "https://github.com/Snowflake-Labs/sfquickstarts",
  depth = 1
)
} # }