Web scraping is a technique to automatically browse a website and extract information from it. This technique can be used for a lot of things including helping businesses. Google for example uses it for its search engine by extracting important information to rank their search results.
The process extracts HTML from a website. You can even decide which section or text to extract.
Let's take a basic example of how a web scraper works. Let's imagine that we are interested in extracting the title of some product in a webpage with the same format. In the page, each product has the tag <h4> and a class called "product".
The HTML shoud look something like this: <h4 class="product">Product name</h4>.
A web scraper will search for the h4 tag that contains the class called product and it will extract the name of all the products with that format. Then we will be able to use all this information by extracting the text or the entire HTML.
You may want to extract important information from your website or others, to show it to your users or potential clients.
Scraping the web for content is a good way to keep your information updated since each scrape will extract up to date information from a website. This avoids any unnecessary updates on your chatbot, since the source of information would be stored somewhere else.
To start scrapping you will need to create a script. More information in How to create scripts.
Got to Scripts>Create New. Give it a name and a description. It should look like this.
Let's take this example. You want to scrape the Freelancer website for new available jobs related in Software and the budget of the projects.
In the Control section, click on More and then "Web Scraping".
Let's enter the url https://www.freelancer.com/jobs/?keyword=software.
We can find the budget of the projects inside the tag "div" with class name "JobSearchCard-secondary-price". If you only want to search by tag you can leave the class name empty.
Switch the search type in the "search by" section. You can search by id or by class.
Press create and save. Now your chatbot will scrape that webpage and show the results.
Messaging apps (Facebook, WhatsApp etc.) cannot render HTML code, but the web chatbot can, so you can check the HTML box in case you want to display full html in your chatbot.
For example, if we want to display the name for the available jobs and days left we would need to enter the tag "div" and the class "JobSearchCard-primary-heading". We would also need to check the HTML box.
Now our web chatbot will be able to show the description to the job offers. Selecting html as an option is necessary in case you want to display external links and images.
You can check both web scraps by interacting with our chatbot.