Even after so much has been said and written about web scraping, there are still a lot of people who look at online data collection negatively. Many advocate that it should be stopped because it is an illegal activity; while others look at it suspiciously not knowing how to deal with it. Although a number of people have greatly benefited from this online service and even gained millions for gathering and providing scraped data, there are indeed some issues that should be addressed properly and clearly presented and understood.
If you want to present data from online sites without fear of legal consequences or negative responses, you have to plainly do it the right way – just like the way you do your researches offline. Some “dos and don’t” of web scraping are: Don’t claim someone else’s work as your own; Do your data collection completely and carefully; Verify sources; and If in doubt, don’t.
Don’t Claim Someone Else’s Work as Your Own
The cardinal rule in any form of data collection or research is to never ever claim someone else’s work as your own. It is an act of theft to ignore or forget to acknowledge the source. It is even equally detrimental when you fail to give credit to whom it is due.
Due to the rapid flow of incoming and outgoing information as well as the demands of the data mining services, there may be times that you inadvertently miss to identify the source of the information and still use or publish the collected data. This may appear to be a valid reason or excuse but whether it is intentional or not; the information is owned by somebody else. Though there are no solid laws on intellectual property rights over the online material, it is unethical to use others’ work without giving them the credit due to them.