Patent Filled (US & India) - Smart Duplicate Data Processing

The video demos the prototype application developed for this patent.

FIELD OF THE INVENTION

The present invention relates generally to systems and methods that are configured to detect, monitor, and manage data redundancy in a user specified environment.

BACKGROUND

In order to improve storage utilization and costs, a conventional system may perform a data deduplication process to deal with data redundancy. Generally, the two data deduplication techniques include in-line deduplication and post-process deduplication.

In-line deduplication is a process that involves determining whether a new block of data should be saved on the system based on a determination as to whether a duplicate block of data already exists in the system. If a duplicate block is already stored in the system, then the system creates an instance that references the stored block, and does not save the new block. In-line deduplication therefore prevents redundant data from being stored in the system.

Post-process deduplication is a process that involves storing each block in the system and then implementing a method to delete and eliminate duplicates on the system. Although post-process deduplication may initially store duplicate blocks, post-process deduplication allows blocks to be stored without the initial processing that occurs with in-line deduplication.

These techniques are effective in improving storage utilization from the back-end, while being hidden from users and applications. However, these techniques do not provide users with instant notification along with the flexibility to selectively manage, control, and reduce duplicate data in a user specified environment upon detection.

SUMMARY

The systems and methods described herein attempt to cure the deficiencies of conventional systems by providing immediate notification of duplicate data together with the ability to selectively manage, control, and reduce the storage of each set of duplicate data. Upon implementing a procedure to reduce the storage of duplicates, users will not notice a change from the front-end as each file will be available as usual. However, from the back-end, each file from the set of duplicate data will be configured to link to and reference a single memory region.

In an embodiment, a computer-implemented method is provided. The method includes selecting a logical unit in which duplicate files are to be detected. The logical unit includes at least a portion of a non-transitory computer readable medium having a plurality of files. The method includes receiving a first file to be saved to the logical unit, and saving the first file to the logical unit. The method supports a scanning mode to determine if the logical unit contains duplicate data. The method may identify duplicate data in the logical unit during the scan. The duplicate data may include a first result set that includes the first file and a second file. A linking operation may be performed on the first result set to enable a computer to access the first and second files via the same physical data in the same memory region of the logical unit. The method also supports a monitoring mode to monitor activity that would create duplicate data in the logical unit.

In another embodiment, a computer-implemented method for detecting duplicate files is provided. The method includes selecting a logical unit. The logical unit comprises at least a portion of a non-transitory computer readable medium having a plurality of files stored thereon. The method includes identifying a set of files that have the same file contents. The set of files includes at least one file that is stored in the logical unit. The method includes providing a user interface that displays a listing of the set of files along with an indication as to whether the files are in a linked state or an unlinked state. The linked state indicates that the set of files are accessing the same physical data in the same memory region of the logical unit, whereas the unlinked state indicates that each file of the set is stored in distinct memory regions of the logical unit.

In yet another embodiment, a system for detecting duplicate files is provided. The system includes a database system and at least one processor. The database system comprises hash values that represent file contents of files stored in non-transitory computer readable media. The at least one processor of the system is connected to the database system and the non-transitory computer readable media. The at least one processor is configured to receive information that causes the at least one processor to perform a number of functions. The at least one processor is configured to identify a logical unit that includes at least a portion of the non-transitory computer readable media. The at least one processor is configured to monitor activity to either create or save the active instance to the logical unit. The at least one processor is configured to determine whether content of the active instance of matches file content of at least one file that is already saved in the logical unit upon detecting activity to create or save the active instance to the logical unit. The at least one processor is configured to provide instantaneous notification upon determining that the file contents of the active instance already matches the file contents of at least one duplicate file that is already saved in the logical unit. The notification includes options that enable the user to manage the active instance and the at least one duplicate file.

Additional features and advantages of an embodiment will be set forth in the description which follows, and in part will be apparent from the description. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the exemplary embodiments in the written description and claims hereof as well as the appended drawings.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

ABSTRACT

A computer-implemented method for detecting duplicate files includes selecting a logical unit in which duplicate files are to be detected. The logical unit includes at least a portion of a non-transitory computer readable medium having a plurality of files. The method includes receiving a first file to be saved to the logical unit, and saving the first file to the logical unit. The method supports a scanning mode to determine if the logical unit contains duplicate data. The method may identify duplicate data in the logical unit during the scan. The duplicate data may include a first result set that includes the first file and a second file. A linking operation may be performed on the first result set to enable a computer to access the first and second files via the same physical data in the same memory region of the logical unit. The method also supports a monitoring mode to monitor activity that would create duplicate data in the logical unit.

SPECIAL THANKS

I would Like to thank the following people Shaik Nisaruddin, Satish Kumar Govindaraju, and Neha Jain for there eminence contribution and support.

I would also add a special thanks to Milind Halageri, Shwetha Sreedharan and Namrata Dessai Shetgaonkar for their valuable contribution.

I would also like to thank Unisys for their support in making this possible and more over a reality. Unisys has provided each individual the ability to showcase their talents and leverage their innovation, dedication, commitment, etc..

Note: The patent is still under process.

Read up on how the idea came about: http://vireal.blogspot.in/p/an-idea-day-keeps-brain-ticking-away.html

Keyboard First Impressions: Kinesis Advantage Mechanical Ergonomic Keyboard

An ergonomic keyboard with mechanical switches that’s looking to attract users interesting in a high quality, highly ergonomic offering and don’t mind the rather steep learning curve or the price. The TECK isn’t the only such keyboard, of course, and I decided to see what other mechanical switch ergonomic keyboards I could get for comparison. Next up on the list is the granddaddy of high-end ergonomic keyboards, the Kinesis Contour Advantage . I wanted to provide my first impressions of the Kinesis, along with some thoughts on the initial switch and the learning curve. This time, I also made the effort to put together a video of my first few minutes of typing. It actually wasn’t as bad as with the TECK, but that’s likely due to the fact that I already lost many of my typing conventions when I made that switch earlier this year. I’ll start with the video, where I take a typing test on four different keyboards and provide some thoughts on the experience, and then ...

ViReal

Search This Blog

Patent Filled (US & India) - Smart Duplicate Data Processing

Comments

Post a Comment

Popular posts from this blog

Keyboard First Impressions: Kinesis Advantage Mechanical Ergonomic Keyboard

Everything about Java 8

OnePlus 3T in 2017