How to scrap a webpage with a chatbot

Web scraping is a technique to automatically browse a website and extract information from it. This technique can be used for a lot of things including helping businesses. Google for example uses it for its search engine by extracting important information to rank their search results.

The process extracts HTML from a website. You can even decide which section or text to extract.

How does web scraping work?

Let's take a basic example of how a web scraper works. Let's imagine that we are interested in extracting the title of some product in a webpage with the same format. In the page, each product has the tag <h4> and a class called "product".

The HTML shoud look something like this: <h4 class="product">Product name</h4>.

A web scraper will search for the h4 tag that contains the class called product and it will extract the name of all the products with that format. Then we will be able to use all this information by extracting the text or the entire HTML.

Scrap a webpage with a chatbot

You may want to extract important information from your website or others, to show it to your users or potential clients.

Scraping the web for content is a good way to keep your information updated since each scrap will extract up to date information from a website. This avoids any unnecessary updates on your chatbot, since the source of information would be stored somewhere else.

How to scrap a webpage with your chatbot

To start scrapping you will need to create a script. More information in How to create scripts.

Got to Scripts>Create New. Give it a name and a description. It should look like this.

Let's take this example. You want to scrape the Monster website for new available jobs in Software Development and the companies that are offering them.

In the Control section, click on More and then "Web Scraping".

Let's enter the url https://www.monster.com/jobs/search/?q=Software-Developer&where=Australia.

We can find the name of the companies inside the tag "div" with class name "company". If you only want to search by tag you can leave the class name empty.

Switch the search type in the "search by" section. You can search by id or by class.

Press create and save. Now your chatbot will scrap that webpage and show the results.

Display HTML

Messaging apps (Facebook, WhatsApp etc.) cannot render HTML code, but the web chatbot can, so you can check the HTML box in case you want to display full html in your chatbot.

For example, if we want to display the names and the links for the new available jobs we would need to enter the tag "h2" and the class "title". We would also need to check the HTML box.

Now our web chatbot will be able to show the links to the job offers.

You can check both web scraps by interacting with our chatbot.

Would you like to know more?

Get more information