You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(CI): add a GitHub Action to lint the Markdown
- Lint Markdown for blank lines and code block languages
- Add a `.markdownlint.yml` config file
- Run Action on pull request only
- Fix missing HTTPS on web.scraper.workers.dev links
- Add missing comma
Copy file name to clipboardexpand all lines: README.md
+7-11
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Web Scraper makes it effortless to scrape websites. You provide a URL and CSS selector and it will return you JSON containing the text contents of the matching elements. You can also scrape HTML attribute values by optionally specifying an attribute name.
4
4
5
-
[Website →](http://web.scraper.workers.dev)
5
+
[Website →](https://web.scraper.workers.dev)
6
6
7
7
[](https://deploy.workers.cloudflare.com/?url=https://github.com/adamschwartz/web.scraper.workers.dev)
8
8
@@ -53,14 +53,13 @@ Web Scraper makes it effortless to scrape websites. You provide a URL and CSS se
53
53
}
54
54
```
55
55
56
-
57
56
## API
58
57
59
-
- Requests are made as `GET` against `http://web.scraper.workers.dev`.
58
+
- Requests are made as `GET` against `https://web.scraper.workers.dev`.
60
59
- There are <strong>two required</strong> query params, `url` and `selector`.
61
60
- There are three optional query params, `attr`, `pretty` and `spaced`.
62
61
63
-
<pre><code>http://web.scraper.workers.dev
62
+
<pre><code>https://web.scraper.workers.dev
64
63
?<strong>url</strong>=https://example.com
65
64
&<strong>selector</strong>=p
66
65
&<strong>attr</strong>=title
@@ -124,7 +123,7 @@ If an `attr` is provided, the result will be a string matching only the first no
124
123
125
124
Consider the following DOM structure:
126
125
127
-
```
126
+
```html
128
127
<div><p>This is the first paragraph.</p><p>This is another paragraph.</p></div>
129
128
```
130
129
@@ -138,7 +137,6 @@ With `spaced` set to `true`, the result is:
138
137
139
138
```This is the first paragraph. This is another paragraph.```
140
139
141
-
142
140
## Development
143
141
144
142
Web Scraper is powered by [Cloudflare Workers](https://workers.cloudflare.com), heavily utilizing [HTMLRewriter](https://developers.cloudflare.com/workers/reference/apis/html-rewriter/) for parsing the HTML and scraping the text.
@@ -147,24 +145,22 @@ To develop Web Scraper locally, pull down the repo, and follow these steps:
This will open up the Workers preview experience so you can test and debug the site. The main source can be found in `index.js`. As you make changes you’ll see them live in the previewer.
161
-
158
+
This will open up the Workers preview experience, so you can test and debug the site. The main source can be found in `index.js`. As you make changes you’ll see them live in the previewer.
162
159
163
160
## Deploying
164
161
165
162
Web Scraper is deployed automatically when changes are pushed to master using a [GitHub Action](https://github.com/features/actions) and the [Workers CLI](https://github.com/cloudflare/wrangler).
166
163
167
-
168
164
## Author
169
165
170
166
Web Scraper was created by [Adam Schwartz](https://adamschwartz.co).
0 commit comments