Skip to main content

Patent Filled (US & India) - Smart Duplicate Data Processing



The video demos the prototype application developed for this patent.

FIELD OF THE INVENTION
The present invention relates generally to systems and methods that are configured to detect, monitor, and manage data redundancy in a user specified environment. 

BACKGROUND
In order to improve storage utilization and costs, a conventional system may perform a data deduplication process to deal with data redundancy.  Generally, the two data deduplication techniques include in-line deduplication and post-process deduplication.

In-line deduplication is a process that involves determining whether a new block of data should be saved on the system based on a determination as to whether a duplicate block of data already exists in the system.  If a duplicate block is already stored in the system, then the system creates an instance that references the stored block, and does not save the new block.  In-line deduplication therefore prevents redundant data from being stored in the system.

Post-process deduplication is a process that involves storing each block in the system and then implementing a method to delete and eliminate duplicates on the system.  Although post-process deduplication may initially store duplicate blocks, post-process deduplication allows blocks to be stored without the initial processing that occurs with in-line deduplication.

These techniques are effective in improving storage utilization from the back-end, while being hidden from users and applications.  However, these techniques do not provide users with instant notification along with the flexibility to selectively manage, control, and reduce duplicate data in a user specified environment upon detection.

SUMMARY
The systems and methods described herein attempt to cure the deficiencies of conventional systems by providing immediate notification of duplicate data together with the ability to selectively manage, control, and reduce the storage of each set of duplicate data.  Upon implementing a procedure to reduce the storage of duplicates, users will not notice a change from the front-end as each file will be available as usual.  However, from the back-end, each file from the set of duplicate data will be configured to link to and reference a single memory region.

 In an embodiment, a  computer-implemented method is provided.  The method includes selecting a logical unit in which duplicate files are to be detected.  The logical unit includes at least a portion of a non-transitory computer readable medium having a plurality of files.  The method includes receiving a first file to be saved to the logical unit, and saving the first file to the logical unit.  The method supports a scanning mode to determine if the logical unit contains duplicate data.  The method may identify duplicate data in the logical unit during the scan.  The duplicate data may include a first result set that includes the first file and a second file. A linking operation may be performed on the first result set to enable a computer to access the first and second files via the same physical data in the same memory region of the logical unit.  The method also supports a monitoring mode to monitor activity that would create duplicate data in the logical unit.

In another embodiment, a computer-implemented method for detecting duplicate files is provided.  The method includes selecting a logical unit.  The logical unit comprises at least a portion of a non-transitory computer readable medium having a plurality of files stored thereon.  The method includes identifying a set of files that have the same file contents.  The set of files includes at least one file that is stored in the logical unit.  The method includes providing a user interface that displays a listing of the set of files along with an indication as to whether the files are in a linked state or an unlinked state.  The linked state indicates that the set of files are accessing the same physical data in the same memory region of the logical unit, whereas the unlinked state indicates that each file of the set is stored in distinct memory regions of the logical unit.

In yet another embodiment, a system for detecting duplicate files is provided.  The system includes a database system and at least one processor.  The database system comprises hash values that represent file contents of files stored in non-transitory computer readable media.  The at least one processor of the system is connected to the database system and the non-transitory computer readable media.  The at least one processor is configured to receive information that causes the at least one processor to perform a number of functions.  The at least one processor is configured to identify a logical unit that includes at least a portion of the non-transitory computer readable media.  The at least one processor is configured to monitor activity to either create or save the active instance to the logical unit.  The at least one processor is configured to determine whether content of the active instance of matches file content of at least one file that is already saved in the logical unit upon detecting activity to create or save the active instance to the logical unit.  The at least one processor is configured to provide instantaneous notification upon determining that the file contents of the active instance already matches the file contents of at least one duplicate file that is already saved in the logical unit.  The notification includes options that enable the user to manage the active instance and the at least one duplicate file.

Additional features and advantages of an embodiment will be set forth in the description which follows, and in part will be apparent from the description.  The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the exemplary embodiments in the written description and claims hereof as well as the appended drawings.


It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

ABSTRACT
A computer-implemented method for detecting duplicate files includes selecting a logical unit in which duplicate files are to be detected.  The logical unit includes at least a portion of a non-transitory computer readable medium having a plurality of files.  The method includes receiving a first file to be saved to the logical unit, and saving the first file to the logical unit.  The method supports a scanning mode to determine if the logical unit contains duplicate data.  The method may identify duplicate data in the logical unit during the scan.  The duplicate data may include a first result set that includes the first file and a second file. A linking operation may be performed on the first result set to enable a computer to access the first and second files via the same physical data in the same memory region of the logical unit.  The method also supports a monitoring mode to monitor activity that would create duplicate data in the logical unit.

SPECIAL THANKS
I would Like to thank the following people Shaik Nisaruddin, Satish Kumar Govindaraju, and Neha Jain for there eminence contribution and support.
I would also add a special thanks to Milind Halageri, Shwetha Sreedharan and Namrata Dessai Shetgaonkar for their valuable contribution.
I would also like to thank Unisys for their support in making this possible and more over a reality. Unisys has provided each individual the ability to showcase their talents and leverage their innovation, dedication, commitment, etc..

Note: The patent is still under process.

Read up on how the idea came about: http://vireal.blogspot.in/p/an-idea-day-keeps-brain-ticking-away.html


Comments

Popular posts from this blog

Everything about Java 8

The following post is a comprehensive summary of the developer-facing changes coming in Java 8. This next iteration of the JDK is currently scheduled for general availability in  September 2013 . Read More

Hands-on with Mozilla’s Web-based “Firefox OS” for smartphones

Launching a new mobile OS is a difficult project since the market leaders, Android and iOS, have such  a big lead. Even Microsoft, with its near-infinite financial resources and vast ecosystem of complementary products, has struggled to gain traction. And new entrants face a chicken-and-egg problem: developers don't want to write apps for a platform without many users, while users don't want to buy a phone without many apps. Mozilla, the non-profit foundation behind Firefox, believes it can tackle this dilemma. In 2011, it announced a new project  called Boot2Gecko to build an operating system around its browser. Last year the project was  re-branded Firefox OS, and Mozilla began preparations for a major push into the mobile phone market. In February, Mozilla  unveiled an impressive initial list  of hardware and network partners. If all goes according to plan, Firefox OS phones will be available in a number of countries, mostly in the developing world, la...

Three reasons Microsoft wants to kill the Windows Desktop

Microsoft's Windows Blue update to Windows 8  makes it increasingly clear that Microsoft wants to kill the Desktop.  That may seem self-defeating, but there's method in Microsoft's madness. Here are three reasons I think it wants to eventually kill the Desktop. Help Windows Phone and Windows tablets gain market share Unify the operating system Lock enterprises into future versions of Windows Read More