Are you afraid to delete information? Consider it again

Are you afraid to delete information? Consider it again ...

Many organizations follow a policy of never deleting any of their data. Yet keeping all data around can become costly as data volumes continue to rise. (ROT) according to a survey from Splunk found that 60% of organizations believe that half or more of their data is dark, which means its value is unknown.

As companies face mounting threats of ransomware and cyberattacks, some obsolete data may be dangerous; this data may be underprotected and valuable to hackers, as a result of internal laws or industry regulations. For example, ex-employee data, financial data, or PII data may be required that organizations delete data after a certain period.

Another problem with storing large amounts of outdated data is that it clutters file servers, reducing productivity. According to a Wakefield Research survey conducted in 2021, 54% of office workers in the United States said they spend more time searching for documents and files than responding to emails and messages.

Every file must earn its place down to the last byte, and data should not be deleted prematurely if it has value. When data becomes obsolete, there is a methodical way to confine and delete it.

Obstacles to data deletion

Cultural:We are all data hoarders by nature. Without some analytics to help us understand what data has truly become obsolete, its difficult to change an organizational mindset of retaining all data forever. This unfortunately is no longer sustainable, given the enormous growth in recent years of unstructured data, from genomics and medical imaging to streaming video, electric cars, and IoT products.

Legal/regulatory: Certain data must be retained for a certain period, although most often not for a lifetime. In some cases, data can only be retained for a certain period, such as PII data, according to corporate policy.

Lack of systematic methods to understand data usage:Manually figuring out what data is no longer required and motivating users to act on it is laborious and time-consuming and thus never is completed.

Data deletion advice in the United States

The development of a sustainable data lifecycle management strategy requires the use of the best analytics.Youll want to understand data usage to determine what data may be deleted based on data types, such as interim data, and data usage, such as data used for the first time. This also helps gain buy-in from business users because deletion is based on objective criteria rather than a subjective decision.

With this knowledge, you can anticipate how data will change over time: from primary storage to cloud-based storage, then kept out of the user space in a hidden location, and finally deleted.

Regulations, the potential long-term value of data, and the cost of storage and backups at every stage, from primary to archive storage, are among the things that may impact the policy. These decisions can have huge consequences if, say, datasets are deleted and then used later for analytics or forecasting.

Data owners should consider the costs versus benefits of retaining data for a given workload or dataset. If not, communicate data usage and the policy with stakeholders to ensure they understand when data will expire and if data is retained in a confined or undeleted container. Confinement makes it simpler for users to accept data deletion workflows when they realize that they can unconfine it within the grace period.

Assume that users understand the cost and any extra steps required to access data from deep archival storage for long-term data that must be retained. For example, data committed to AWS Glacier Deep Archive may take several hours to access. Egress fees will often apply.

Deleting data is not a zero-cost process. We usually think only of R/W speeds, but deletion consumes system performance as well. Take this example from a theme park: photos of guests (100K) per day are retained for up to 30 days after the customer has left the park. On day 30, the storage system has to handle 100K photos and delete 100K.

When it comes to delete performance, lazy deletes, they might deprioritize the delete workload, but if the system cant delete data at least as soon as new data is ingested, you will need to add storage to store expired data. In scale-out systems, you may need to add nodes to handle deletes.

A better strategy is to keep cold data out of the primary file system, then confine and delete it, thus reducing the likelihood of undesirable load and performance on the active filesystem.

Once a policy has been established for each dataset, there will be a strategy for execution. An independent data management platform provides a unified approach covering all data sources and storage technologies, while also reducing data management tasks. Collaboration between IT and LOB teams is an integral part of execution, leading to less friction as LOB teams feel they have a voice in data management.

Given the current data growth trajectory, worldwide data is projected to nearly double from 97 ZB in 2022 to 181 ZB in 2025 organizations have little choice but to revisit data deletion policies and find a way to delete more data than they have done in the past.

Without the right tools and collaboration, this can turn into a political conflict. Yet by including data deletion as another well-planned component of the data management strategy, IT will have a more manageable data environment that provides better user experiences and value for money spent on storage, backups, and data protection.

Kumar Goswami is the CEO and cofounder of Komprise.

You may also like: