DataExcavator update from 1.2.9 to 1.2.10
In this update, we added the CAPTCHA processing capability. We have tested several CAPTCHA recognition services that do it in automatic mode. Unfortunately, the quality of such services did not satisfy us. As our testing showed, with automatic recognition, out of 10 CAPTCHA successfully resolved only 2-3. We decided to do the simplest way – and put the event of CAPTCHA processing in the interface. Now, when such a task appears, our application will show you information with the need to process CAPTCHA manually, and after that will continue to work.
The second important change is that we switched from the core of CEF.OffScreen to CEF.Wpf. Now instead of a windowless browser model, we use a virtually complete web browser. In the test window this is well noticeable – now instead of a screenshot of a scanned page you can see a full navigation window. In our opinion, this will simplify and improve the interaction with the application.
DataExcavator update from 1.2.8 to 1.2.9
In this update we have worked on displaying task processing. Now the tasking bar for each task shows the actual percentage of completion, while previously it showed an approximate percentage of completion.
The option “Pages that are currently being processed by scraper” has been added to the appearance of task cards. This allows you to see in more detail the progress of the scraper. This parameter is now obtained by calculating the number of working and idle threads.
Now, after the scraper process is finished, the task stops automatically. Before this change, the task was in “standby” mode, and according to the recommendations of our clients (thanks for good advice from Mr. Rudi) we changed this mechanism. Now it is very pictorial and more convenient.
We have also made adjustments to the export functions. We have added processing of some typical errors and a more visible display of information. Please note – if you try to export scrap results to some directory, the export data will be overwritten. In case some files are opened by other programs at this moment, an error will occur.
DataExcavator update from 1.2.7 to 1.2.8 &
ExcavatorSharp update from 1.2.3 to 1.2.7
It’s a little technical update. We have added error logging with critical failures being sent to our server. Now if something happens to your application, we will know about it.
Also, we have significantly revised the mechanism of logging information inside the scraping project. Now, each project’s data logger is enabled only when it is actually launched. Previously, the logger would start immediately and go into “standby” mode. In some cases, when circumstances failed, it caused the application to hang up. Also, we added some ready-made templates for some sites, for example – coparts.com, ebay.com, macys.com, geeksbuying.com.
We also upgraded our scraping library. This is a technical update – our library is catching up with the version of the application. Information about the library is available under the previous link: open
DataExcavator update from 1.2.6 to 1.2.7
Just a little technical update. Added possibility to extract data from iframe blocks. Now you can access them using conventional .css-selectors, as if it were a page element like div. For correct work with iframe you need to use option expand iframe from Crawling server settings.
We finally started working on the CAPTCHA processor. We chose https://2captcha.com/ as our processing service. Now we have added the original CAPTCHA processing algorithms, but this code is not included in the final build. In the next releases we expect to implement this functionality.
As usual, we have improved the application in several parts, making it more stable and convenient.
DataExcavator update from 1.2.5 to 1.2.6
The current version upgrade is timed to continue working on interface usability. In accordance with the wishes of our users, we have improved the processing of modal windows. Now window groups are synchronously minimized and deployed, reacting well to switching between applications in the working environment. In the same way, we have finally identified and fixed the problem of data export, where the presence of several projects with the same domain could cause errors in the export.
Improved naming of modal windows. Headers now show more relevant information about the project you are currently working with.
As a nice refinement, we finally implemented a live preview of the files from the scrap results and from the settings test window. Now you can view the files extracted from the sites directly from our application. This is convenient for assessing images quality and making decisions about the correctness of extracted data.
In general, this is a small update that improves usability and fixes some current application errors.
DataExcavator update from 1.2.4 to 1.2.5
Whoa! It’s been a tough month. Having collected a decent number of complaints about the fine print in the interface and the constant confusion with competitive modal windows, we decided on some changes.
Let’s start with a description of the interface enhancements. First of all, we have increased the size of the font in all the windows. Secondly, we have greatly simplified many parameter names and short descriptions of these parameters. Thirdly, we have completely redesigned the mechanism of displaying information. Previously, we were convinced that competitive modal windows are good. Alas, this is not the case. It only confuses people, and indeed we ourselves have been confused several times (LOL). Accordingly, now almost all blocks with separate logic are displayed in real modal windows that do not allow you to jump from project to project. This prevents confusion – at every moment you know exactly what project you are working on. Fourthly, we have greatly improved the cards of the projects themselves on the main screen. Now each card is accompanied by a progress bar with the current percentage of scraping of all pages of the website. Also, the main menu of the project has been changed – now it is a little more convenient and understandable.
Another large block of work is the so-called “data patterns”. We have made it very easy to configure the list of nodes you want to extract from the site. Now it looks like a list with .CSS selectors (or XPath expressions), which is immediately available in the settings window. Now you don’t need to make 3-4 extra clicks to add one node for scraping. All nodes in front of you – immediately and without extra clicks.
In general, we decided to take the path of interface simplification. Even though we position our application as a professional utility for scraping, simplicity and ergonomics take their place.
We have also made some corrections to the kernel, but decided not to release a separate kernel release. In particular, we have fixed the problem with .pdf file scraping. We also fixed the incorrect work with the Clean-param parameter from the robots.txt file. On the whole, as always, the application became more stable.
DataExcavator update from 1.2.3 to 1.2.4
In this version, we have added a modal window with a choice of how to analyze the site. This window asks for details of the crawling algorithm: should the app download all pages from the site, or should it download only part of the pages by user list? And although this setting was already in the project properties window, in our opinion it should greatly simplify the work with the application for new users. It is much easier to click on an additional button at once than to dig through the depths of settings and look for the option “Method of links analysis”.
In addition, we found several small bugs concerning project testing and link downloading from separate pages. These bugs were fixed in the same way.
As always, the application has become more stable and convenient.
ExcavatorSharp and DataExcavator updates from 1.2.2 to 1.2.3
In this update, we have traditionally increased the stability of the application by fixing some bugs and running a number of load tests.
Among the significant improvements we have added project templates. Templates contain ready-made settings for well-known sites. As the first 4 templates we added Amazon, Aliexpress, Craigslist and Walmart. Now in each new release we will try to add additional, new templates. This will save time on settings – just use a ready-made template instead of dealing with projects.
In the kernel library we have fixed several significant problems. Under certain circumstances, the function “test settings” and “get links from the site page” hung. We fixed this problem. We have also eliminated the problem of eternal locking of log files, which also sometimes occurred when circumstances fail.
ExcavatorSharp and DataExcavator updates from 1.2.1 to 1.2.2
So, for this block of work we have done some work on fixing interface errors. We have added more thorough exception handling for various situations. The control over importing settings and copying projects has been significantly improved. Interception of exceptions when trying to run several instances of a program has been improved (remember – a program can be run only in one instance). Also added a nice feature of sorting patterns and copying elements inside a pattern. In general, as usual, the application has become more stable and a bit more convenient.
ExcavatorSharp and DataExcavator updates from 1.2 to 1.2.1
Some bugs have been fixed on the hot trails. In particular, the stability of work with the file system has been increased, errors in installation from under an account with limited rights have been corrected. The work of some modules with the file system has been improved. Fixed .CSS selectors auto-definition window behavior.
Interface improvements – added display of logs in the waiting windows. Now you can see live what the program is doing at some point in time. No endless loader 😉
ExcavatorSharp and DataExcavator updates from 1.1 to 1.2
Well, it’s been a tough month. We placed on Codester and CodeCanyon and got some sales. We found several problems with the application 🤷♂️ that were related to validating license keys and the application not running under administrative accounts. At the moment some of them have been resolved and we are ready to present version 1.2 both for the client part and for our library. At the moment, some cases of incorrect validation of license keys remain unclear to us, and we are working on finding and fixing this problem. In general, if you use our application and cannot activate your demo key, please create a license.key file in the folder “C:/ProgramData/DataExcavator” and copy your key there. In this case, the application will not try to activate it remotely. Thank you for your understanding. A list of the most significant improvements is provided below.
- Fixed current errors and improved overall stability of the application.
- Fixed the problem of using the SSL / TLS protocols that are not available in some versions of the OS. The problem occurred when trying to set the ServicePointManager.SecurityProtocol property to certain values on certain operating systems.
- Added principal login functionality to the site as a separate behavior for CEF. Now if you want to extract data from a site that requires authentication by login and password, you can do so not through CEFBehaviors (which requires some skill), but through CEFWebsiteAuthBehavior. Inside you find a simple set of fields, including a template script. In general, this greatly simplifies the work with sites that require authentication.
- Fixed the Excel export algorithm – the library was downgraded to a stable build without additional license fees (EPPlus starting from version 5 is no longer free).
- Fixed the algorithm of exporting through Excel and CSV in complex cases. Now if one of the export results is not recorded successfully, the overall export process does NOT stop.
- Added callback to the export mechanism which is called after each exported record. This allows you to keep track of the export process, rather than waiting for a long time for the program to finish.
- Remains to question the behavior of the program in case it is not run from under the administrator on server versions of the operating system. At the moment, if you have any problems, we recommend you to use the “Launch from under an administrator” mechanism.