One of the main problems with data cleaning, as it is, is the connotation that the words give off. Data cleaning refers to the process of wiping away data that is different to the trend already being set up by the rest of the data. The name does not exactly give it justice for how crucial it is in the research process. It gives off the impression that data needs to be “fixed” or “altered” in order to be correct. The best way to phrase this practice would be to compare it to something more similar. Something such as “data-keeping” works better since it relates to housekeeping, which is a better description of what is actually being done during data cleaning. 

Unfortunately, the one difference between housekeeping and data cleaning is that data cleaning does not just remove “bad” data. The data cleaning process can often eliminate data that is significant because of its uniqueness. This could cause the results of a study to be less accurate to the real world. 

A possible solution for this would be to highlight the unique sets of data, instead of hiding them. This could be done by putting a spotlight on regional or even spatial differences. The researchers may not know why these differences are occurring, but that just leaves room for more research to be conducted on this topic in the future. This gives us more possibilities for future discoveries that we might not have had if the original study was “cleaned”.