The video demos the prototype application developed for this patent.
FIELD
OF THE INVENTION
The present invention relates generally to
systems and methods that are configured to detect, monitor, and manage data
redundancy in a user specified environment.
BACKGROUND
In order to improve storage utilization and
costs, a conventional system may perform a data deduplication process to deal
with data redundancy. Generally, the two
data deduplication techniques include in-line deduplication and post-process
deduplication.
In-line deduplication is a process that involves
determining whether a new block of data should be saved on the system based on
a determination as to whether a duplicate block of data already exists in the
system. If a duplicate block is already
stored in the system, then the system creates an instance that references the
stored block, and does not save the new block.
In-line deduplication therefore prevents redundant data from being
stored in the system.
Post-process deduplication is a process that
involves storing each block in the system and then implementing a method to
delete and eliminate duplicates on the system.
Although post-process deduplication may initially store duplicate
blocks, post-process deduplication allows blocks to be stored without the
initial processing that occurs with in-line deduplication.
These techniques are effective in improving
storage utilization from the back-end, while being hidden from users and
applications. However, these techniques
do not provide users with instant notification along with the flexibility to
selectively manage, control, and reduce duplicate data in a user specified
environment upon detection.
SUMMARY
The systems and methods described herein attempt
to cure the deficiencies of conventional systems by providing immediate notification
of duplicate data together with the ability to selectively manage, control, and
reduce the storage of each set of duplicate data. Upon implementing a procedure to reduce the
storage of duplicates, users will not notice a change from the front-end as
each file will be available as usual.
However, from the back-end, each file from the set of duplicate data
will be configured to link to and reference a single memory region.
In an embodiment, a computer-implemented method is provided. The method includes selecting a logical unit
in which duplicate files are to be detected.
The logical unit includes at least a portion of a non-transitory
computer readable medium having a plurality of files. The method includes receiving a first file to
be saved to the logical unit, and saving the first file to the logical
unit. The method supports a scanning
mode to determine if the logical unit contains duplicate data. The method may identify duplicate data in the
logical unit during the scan. The
duplicate data may include a first result set that includes the first file and
a second file. A linking operation may be performed on the first result set to
enable a computer to access the first and second files via the same physical
data in the same memory region of the logical unit. The method also supports a monitoring mode to
monitor activity that would create duplicate data in the logical unit.
In another embodiment, a computer-implemented
method for detecting duplicate files is provided. The method includes selecting a logical
unit. The logical unit comprises at
least a portion of a non-transitory computer readable medium having a plurality
of files stored thereon. The method
includes identifying a set of files that have the same file contents. The set of files includes at least one file
that is stored in the logical unit. The
method includes providing a user interface that displays a listing of the set
of files along with an indication as to whether the files are in a linked state
or an unlinked state. The linked state
indicates that the set of files are accessing the same physical data in the
same memory region of the logical unit, whereas the unlinked state indicates
that each file of the set is stored in distinct memory regions of the logical
unit.
In yet another embodiment, a system for
detecting duplicate files is provided.
The system includes a database system and at least one processor. The database system comprises hash values
that represent file contents of files stored in non-transitory computer
readable media. The at least one
processor of the system is connected to the database system and the
non-transitory computer readable media.
The at least one processor is configured to receive information that
causes the at least one processor to perform a number of functions. The at least one processor is configured to
identify a logical unit that includes at least a portion of the non-transitory
computer readable media. The at least
one processor is configured to monitor activity to either create or save the
active instance to the logical unit. The
at least one processor is configured to determine whether content of the active
instance of matches file content of at least one file that is already saved in
the logical unit upon detecting activity to create or save the active instance
to the logical unit. The at least one
processor is configured to provide instantaneous notification upon determining
that the file contents of the active instance already matches the file contents
of at least one duplicate file that is already saved in the logical unit. The notification includes options that enable
the user to manage the active instance and the at least one duplicate file.
Additional features and advantages of an
embodiment will be set forth in the description which follows, and in part will
be apparent from the description. The
objectives and other advantages of the invention will be realized and attained
by the structure particularly pointed out in the exemplary embodiments in the
written description and claims hereof as well as the appended drawings.
It is to be understood that both the foregoing
general description and the following detailed description are exemplary and
explanatory and are intended to provide further explanation of the invention as
claimed.
ABSTRACT
A computer-implemented method for detecting
duplicate files includes selecting a logical unit in which duplicate files are
to be detected. The logical unit
includes at least a portion of a non-transitory computer readable medium having
a plurality of files. The method includes
receiving a first file to be saved to the logical unit, and saving the first
file to the logical unit. The method
supports a scanning mode to determine if the logical unit contains duplicate
data. The method may identify duplicate
data in the logical unit during the scan.
The duplicate data may include a first result set that includes the
first file and a second file. A linking operation may be performed on the first
result set to enable a computer to access the first and second files via the
same physical data in the same memory region of the logical unit. The method also supports a monitoring mode to
monitor activity that would create duplicate data in the logical unit.
SPECIAL THANKS
I would Like to thank the following people Shaik Nisaruddin, Satish Kumar Govindaraju, and Neha Jain for there eminence contribution and support.
I would also add a special thanks to Milind Halageri, Shwetha Sreedharan and Namrata Dessai Shetgaonkar for their valuable contribution.
I would also like to thank Unisys for their support in making this possible and more over a reality. Unisys has provided each individual the ability to showcase their talents and leverage their innovation, dedication, commitment, etc..
Note: The patent is still under process.
Read up on how the idea came about: http://vireal.blogspot.in/p/an-idea-day-keeps-brain-ticking-away.html
Comments
Post a Comment