Click to Play

Taking Search Advertising to...
Search advertising has been a very successful business within search engine marketing. But can it be improved moving forward? Nick Fox, the Business Product...

Recent Articles

Backing Up Wordpress For A Pain Free Upgrade
I upgraded this blog today to WordPress version 2.8.3, the latest version released earlier this week. For the past few years, I've followed a 6-step process whenever I upgrade any of my WordPress blogs.

Global Storage Market Dips In Q1
The global storage software market saw its first decline in 21 consecutive quarters of year-over-year growth in the first quarter of 2009 (1Q09) with revenues of $2.8 billion, representing a 5.2 percent decline in growth over...

Permanently Delete All Information From Your...
I learned something new from this interview about how I was unknowingly leaving data exposed so I want to share my AppGap post on Drive eRazer here. We often talk about the new transparency in enterprise 2.0.

Easily Back Up And Store Data With HP SimpleSave...
HP announced last week the HP SimpleSave external hard drive series for customers needing a simple and automatic backup for photos, video, music and data. Protecting personal content with SimpleSave is as...

Running Your Mac Book From An SD Card
Booting your Apple Mac Book just got tons easier, and easier for forensics or people who have locked themselves out of their computers. Boot up in a fresh image off your SD Card, and the whole drive system...

Increasing And Optimizing Your Storage Components
Sometimes we do things "just because" it's how we've always done things - regardless of whether or not it makes sense to do. It seems to me that all the excitement, money, and noise around Flash...

Making Storage More Affordable
The economy may be getting stronger, but it's still important to keep expenses down. People who deal with enterprise storage can help a lot in this respect, and we'll try to name a few of the most important...



08.31.09

How To Compress And Store A Linux Archive

By Dave Taylor

We have a situation where we need to keep a ZIP archive of some data files available on our Ubuntu Linux server so that our satellite offices can grab the information through slower data lines. Problem is, the underlying files change 2-3 times a day. What's a quick, efficient way to only rebuild the ZIP archive file on our Linux system if a file's changed, but leave it as-is if everything's stayed the same?

Dave's Answer:

I really like these sort of questions because there are so many different ways to solve them. You could, for example, just brute force rebuild the ZIP archive every few hours, but that's a pretty inelegant solution and is bound to waste a lot of computing cycles, though that might not be a big deal. The bigger deal is that it could also leave your remote offices stuck with corrupted archive files because a new build started half-way through their latest transfer, a situation that's a worst case scenario, I'm sure.

The cornerstone of this solution is to create a short shell script and then use "test" to ascertain if the data source files are updated (or, in the language of the script, newer than the ZIP archive file). If they are, then create the ZIP file to a different filename and when the archive and compression process is done, rename the new name to the standard archive name.

The basic logic is:

if [ files-to-archive are newer than archive ] then
  rebuild archive to temp file
  mv temp file to archive
endif

Now, to make that code, we'll want to check the "test" man page, which informs us that:

    file1 -nt file2
      True if file1 exists and is newer than file2.

I have a similar situation with an archive I'm maintaining, so the first step is to ascertain which files we want to test against. In my case, it's 26 files, so having a chain of if-then-else statements would be crazy ugly. But how to ascertain which file is newest?

The solution is so simple it's eerie! Just use "ls": ls -t | head -1 gives you the most recently modified (touched) file in the directory. Since I am working with XML files it makes sense to constrain this just a little bit, so I'll use something more akin to ls -t *.xml | head -1 instead.

Download Now

If I had an explicit list of files to check, it'd be easy to set a variable that contains all the names:

filenames="file1 file2 file3 file4 file5 file6 file7"

So let's put it all together and see what we get:

target="everything" # target filename for full ZIP archive + .zip
searchdb="search-database" # target filename for search db ZIP archive

newestfile="$(ls -t *xml | head -1)"

if [ $newestfile -nt $target.zip ] ; then
  # time to rebuild the archive
  zip $target *xml
fi

That's basically all you need: make sure that the "newestfile" accurately picks up which of your set of source files is newest (and if you use a list of files, just use that in the statement instead of an explicit pattern, like "newestfile=$(ls -t $filenames | head -1)"

The only issue remaining in the above code is the potential problem of having the archive be slowly built while a remote site is downloading it at the same time. Not good. To avoid that, just use this:

if [ $newestfile -nt $target.zip ] ; then
  # time to rebuild the archive
  zip $interim *xml
  mv $interim.zip $target.zip
fi

What's nice about this is that it has a very low processor footprint, so it's going to have minimal impact if you have the script run every hour or two via a cron job, which is what I do. In fact, my script is a bit more complex because I also take advantage of the "-x" flag to "zip" that lets me exclude a specific temporary file, as in "zip archive * -x *zip".

Comments


About the Author:
Dave Taylor is known as an expert on both business and technology issues. Holder of an MSEd and MBA, author of twenty books and founder of four startups, he also runs a marketing company and consults with firms seeking the best approach to working with weblogs and social networks. Dave is an award-winning speaker and frequent guest on radio and podcast programs.

AskDaveTaylor.com
http://www.intuitive.com/blog/
About StorageInsider
Enterprise storage strategies, news and reviews for IT professionals.





StorageInsider is brought to you by:

WebProNews.com Jayde.com
MarketingNewz.com SalesNewz.com
ActivePro.com InvestNewz.com
eCommNewz.com WebsiteNotes.com
AdvertisingDay.com ManagerNewz.com
SoHoDay.com CRMNewz.com






-- StorageInsider is an iEntry, Inc. publication --
iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509
2009 iEntry, Inc. All Rights Reserved Privacy Policy Legal

archives | advertising info | news headlines | free newsletters | comments/feedback | submit article


Storage News and Reviews Storage Insider News Archives About Us Feedback StorageInsider Home Page About Article Archive News Downloads WebProWorld Forums Jayde iEntry Advertise Contact