In the finished project you see 3 zones:
1 Project common – The settings page where we are now. Elements to scrape – The elements we will be scraping. Robot settings – More detailed settings for the application.
2 In this area we set the name of the project and where it will be saved on our computer.
3 The home page of the website we will be parsing and the interactive view of the website where we will be selecting items.
Elements to scrape
1 In the first block we see all the items we have selected.
2 This is our workspace where we can select items for scraping.
3 This is the field where we enter the page we want to retrieve. If you accidentally navigate to another page while selecting items, you can return to the original page by pressing “Navigate” with this block.
These are the basic settings of our parser.
Robot speed – This is the speed of the Data Excavator. This is the number of simultaneous threads that will perform scraping. Depending on the capacity of your system, you can set the speed that works best for you.
What website pages should be scanned? – This is the section responsible for the pages to be scanned by the program.
What bebsite pages should be scraped? – This is the section responsible for the pages from which scraping will be performed
These are the most basic and simple settings you will need.
After entering all the settings, we can check if everything works correctly. To do this, press “Test”
In the menu that opens we can see how the program worked and what it extracted. On the left we see the result of extraction in HTML code and in text format.
IMPORTANT: for the test you need to insert a link to the page from which the scraping will take place, not to be confused with the section.
If you are satisfied with the result, we close the test window and return to our settings. In order to save the project you need to click “Save” button.
Also if you want to import a finished project, you will need to click on the “Import” button.
After saving, your project will be in the My projects column.
On the “Start” button we can start our project.
With the “Stop” button we can pause the scraping process.
- Export ready table with data.
- Actions manipulation of the finished project (Copying, deleting, etc.)
- Settings returns you to the project settings you made before.
Click to view: Lesson 4: Creating your first project