Probably the internet can be regarded as the single largest source of information. This is so because of the nature of its operation. Information can be spread and be accessed easily in the shortest time possible. Users have the freedom of accessing the information by following URLs and hyperlinks. The internet has millions of web pages on the World Wide Web. For this reason, web data mining is used to get information from the internet as the process is not an easy task.
1- The daily growth of data on the web makes the extraction of important patterns a complicated task. This can be attributed to the problems on the regulation of the semi-structured and unstructured content. The evolution of internet data makes it distinct from the print documents. For this reason, database management has become a difficult task. Web data mining is therefore an important process in regard to mining of data online.
2- The process of web data mining will encompass the utilization of data mining tools and experts in discovering and getting information from the web. The process can therefore be divided into the following four sub-tasks:
3- Resource Finding. This first step of data mining will involve retrieving data from both online and offline sources. Some sources from the internet include website content, HTML documents, and newsletters
4- Information selection. This process is also called pre-processing. It is important to transform the data into usable idea after the completion of data extraction from the internet. This process will involve removing of stop words and then representing the data into a logical order
5- Generalization. In this process, the general patterns and trends that are within the target websites and other multiple websites are identified. Data mining companies and experts will have to carry on this process
6- Analysis. The process of analysis involves validation of the information that has been processed from data, pattern identification and finally interpretation of the patterns.
There are basically three main factors in web data mining that will influence the perception and evaluation process. They include:
1- Web page design. The design and arrangement of a website will determine the data mining process that is going to be undertaken. They way it has been linked to other website is also an important consideration.
2- Web page content. Everyone who searches information online looks for content. The content of any website should be rich and informative. It is thus important that web scraping is undertaken on websites that are regarded to have important information.
3- Website design and structure. Another important factor that should be taken into consideration is the number of pages and kind of publishing platform that is being used.