Troubleshooting Professional Magazine
The Windows to Linux Conversion |
|
Before answering this question, read your software's license agreement. Notice the part that forbids reverse engineering. Now read the text of the proposed UCITA legislation, which has so far been enacted in Virginia and Maryland. Notice that UCITA gives such license agreements the full force of law. So if other vendors can't reverse engineer your vendor's product, how can they make import mechanisms to bring your data into their software? Will your software's vendor be kind enough to provide an export mechanism? Can you bet your business on the hope of such kindness?
Who owns your data?
Your software vendor can triple the price of your software and stop supporting your version, or even prevent you from using it. I've even heard vague, unsubstantiated rumors that some vendors of proprietary software are starting to require you to get a onetime only installation key every time you reinstall the software you legally purchased, and even charge you for the privilege of giving you that key. In five years will they still sell you the key, or will they force you to upgrade? Given the role of application reinstallation in Windows Troubleshooting, do you think this might create problems?
Other vendors cannot legally reverse engineer to provide an import facility. What choice to you have?
Who owns your data?
As you read this month's Troubleshooting Professional, keep this one question in mind. Because I'll be describing the transition from Windows to Linux as a long, difficult process requiring advanced planning and discipline. It's a process far more costly than licensing new versions of Windows every 3 years. It's very tempting to take the course of least resistance and stick with Windows. On the desktop, at least, that was a tempting alternative even for my small company. Sure, Windows crashed all the time, but is moving Troubleshooters.Com to a Linux desktop worth 40 hours of my labor? That's about what it took. Believe me, spending $400 or whatever on Windows 2000 would have been cheap. So why did I switch?
There are those who might say I switched only to stop appearing as a hypocrite. There's some truth to that. It's hard to be a credible Linux advocate when you use Windows. But I could have quite easily advocated Linux servers, while not taking a stand on the desktop.
There are those who might accuse me of switching my desktop OS as a publicity move to "get" Microsoft. It's no secret that I've disliked Microsoft since I was forced to use their inferior C compiler in the late 80's. Yet since that time I've used, and actually advocated, Microsoft Word.
Maybe I switched just for fun. A hacking exercise. Indeed, a lot of hacker type Linux advocates brush aside all protests of business difficulties and encourage all businesses to cold-turkey switch to Linux. Until recently I was just such an advocate. But I have a 19 year old business with 16,000 data files. You don't put such a business in jeopardy just to prove a point.
Believe me, it would have been easier and cheaper for me to stick with Windows, and simply endure the taunts of Linux advocates. The only problem would have been little voice whispering in my ear, saying
Who owns your data?
So now I own my data. Don't take my word for it. Read the text of the GPL, or the BSD license, or even the much maligned Artistic license. I have every right to the source code, and I can use it to reverse engineer a way to migrate my data, should I ever need to. Better yet, I can even use my application's source code in that migration, subject only to the copyleft restrictions of some of the licenses (in other words, I may not be able to proprietarize that migration tool). But why should I? Data migration isn't my core product.
I get fringe benefits. My new software crashes much less. For the most part, my new software has better features. In those few cases where the Windows software was more suitable (Microsoft Word and Micrografx Windows Draw), I can take my time and find great Open Source alternatives. And because I never upgraded to the post-UCITA "latest and greatest" of these apps, there are plenty of Open Source apps that can import my data.
This issue of Troubleshooting Professional first details the steps and missteps of Troubleshooters.Com's transition from the Windows desktop to the Linux desktop in an article titled The Conversion of Troubleshooters.Com, and then gives the practical tips I wish somebody had given me, before my conversion, in an article called Conversion Tips. Last but not least, this month's Linux Log discusses the realities of advocacy in a business setting -- a must read for consultants, managers, and technologists.
Troubleshooters.Com's situation parallels that of a wide variety of businesses, from those just big enough to take their data seriously, to those not quite big enough to field an army of lawyers to fight Microsoft, and an army of lobbyists to "contribute" to presidents and legislators. Yes, the transition is tough. And anyone bigger than Troubleshooters.Com will be forced to pay for the transition with cold, hard cash, rather than the "sweat equity" with which I paid. And those in public corporations will face the intense scrutiny of their shareholders and directors. It's too high a price to pay for less crashes, and certainly too high a price to pay in order to avoid paying licensing fees.
Indeed, there's only one compelling reason to switch away from the Windows desktop. It's because of the importance of your data. A company's data is indispensable to their ongoing operations. Severe data loss is usually followed by bankruptcy.
Ben Franklin once said "They that can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.". To that I add that those giving up the liberty for safety eventually find themselves the most unsafe of all.
Some day the owner or the CEO or the board of directors will ask Who owns our data? And because you led the transition to Open Source on the desktop, you'll smile and say:
We do!
The 19 year old company widely know as Troubleshooters.Com is officially named American Troublebusters. Before 1990 it was known as Steve Litt Business Systems, and before 1986 as Steve's Stereo Repair, which was founded in 1982 with an old voltmeter, a card table, and pens and 3 by 5 cards for marketing. This company had a computer for every second of its existence.
So it's surprising that in late 1999, when I decided to migrate to Linux on the desktop, my company had never undergone a major data conversion. It was obvious that the switch to Linux would be a very major data conversion, complete with a massive change in applications and authoring techniques. It wasn't a decision taken lightly. You might wonder why I decided to convert.
One reason is that I love Linux. By December of 1999 I had one Windows machine desktop, and a Linux server (Samba, Apache, DNS, DHCP), and a Linux hackaround box on which I experimented with both server and desktop work. Linux had performed outstandingly on the server, and I hoped to gain some of that productivity for my desktop. But in fact, early trials indicated that my work on a Linux desktop would be less, not more productive. The KDE, Gnome and fvwm2 of that time were crude at best.
And to be honest, another reason was to do what I believed in. It's hard to advocate Linux when every email you send has an X-Mailer line saying "Eudora".
But love of Linux and doing what I believed in are a minor part of the story. A business doesn't survive for 19 years with decisions made on impulse or idealism. The main reason was UCITA.
UCITA is a piece of legislation, recommended for passage in the various states by the NCCUSL (National Conference of Commissioners on Uniform State Laws) on July 29, 1999. It's a set of default contract provisions that's very harsh on the software vendor, but it gives the vendor's shrink wrap license agreement the full force of law. Lawyers will argue with me, but I read all 100+ pages of the proposed legislation, and I can tell you it enforces almost any provision the vendor places in the shrink wrap license agreement. So you know those clauses against reverse engineering that we all used to laugh at? They're the law now.
That means that Open Source vendors cannot legally reverse engineer commercial software, even for the purpose of importing data made by that commercial software, if the vendor's license agreement bans reverse engineering. Look where that brings us. If and when UCITA is passed in Washington state, it will be illegal for Kword, Abiword, or even proprietary products like Star Office or Applix, to reverse engineer in order to provide a .doc import mechanism. So Microsoft can immediately change their .doc format, and nobody can import documents made with the modern version. Microsoft is under no obligation to provide export mechanisms. Basically, they're holding your data hostage.
So suddenly you have no alternative but Microsoft. Some companies in a monopoly position will treat you well. It depends on the company's mindset. Who knows. Maybe Microsoft will keep the same prices and quality after they have no competition. But that's not something I want to bet on.
I must prevent the monopolistic capture of my data at all cost. No matter how difficult, time consuming, or costly, I must get out of the Microsoft world.
Nor is UCITA the only problem. There's a trend in this country to bend over backward in granting broad and arbitrary monopolies in all matters of "intellectual property". Look at the DMCA legislation. Look at the effects of the government sponsored patent monopolies granted the drug companies. Look at the continual lengthening of copyrights now even beyond the lifetime of the author. Recent court decisions have even upheld the "inevitable disclosure doctrine", by which an employee can be barred from going to work for a competitor, even if that employee has never signed a non-disclosure. These are strange times indeed.
Open Source is the answer. By license, if worst comes to worst I have the guaranteed right to go into my software and fix problems the "vendor" won't. I have the right to use this software in perpetuity. Nobody can tell me I can't use the software to access my data. And more importantly, nobody can tell me I can't reverse engineer the program to make a data import or export mechanism. Actually, the license gives me the explicit right to COPY THE CODE TO MAKE AN IMPORT OR EXPORT MECHANISM.
Transition to Linux became a priority.
Using Caldera desktop Linux, I began experimenting heavily with the Linux desktop. It was slow as molasses on my old Pentium 150. I upgraded that box from 32Meg to 96Meg, but it just wasn't fast.
So in February 2000 I bought a dual Celeron 450 with 512Meg and 20Gig in February 2000. I threw Caldera Linux on it and got to work, secretly hoping I could transition before my publicly stated July 2000 cutover date.
During the next couple months I bounced from distro to distro -- Caldera, then Corel (what a mess), then finally I settled on Mandrake. I've been a Mandrake Man ever since.
As my July 2000 date neared, I got disappointment after disappointment. PPP was much harder to configure than Windows dial up networking. Cut and paste was not consistent across all apps like it is in the Windows world. Hotkeys were inconsistent across apps, to the extent that in many apps the main menu wasn't accessible through Alt keys. The mouse was pitifully slow, and cranking up the acceleration just made it even more difficult to use. There appeared to be no equivalents for my beloved Micrografx Windows Draw vector drawing program, or the Paintshop Pro version 3 that I used on a daily basis to finish off my drawings on a pixel by pixel basis. Although early on Klyx looked like it might be a good replacement for Microsoft Word in writing long books, in fact it wasn't. There was no outliner. I couldn't get Hylafax faxing software to work. Kmail looked like a poor replacement for my beloved Eudora Lite. Netscape Composer, the mainstay of Troubleshooters.Com, had bugs and behaved in an unexpected manner under Linux. Ughh!
I wrote the June 2000 Troubleshooting Professional Magazine, themed "Making it in a Post Microsoft World", as a transition plan for myself (and of course for tens of thousands of others). I began following that plan, slowly but surely, spending most of my time in the "Desktop Experimentation" step of my conversion plan. Time rolled on...
The headline screaming across the front page was that now any computer on the Smoothwall box's subnet could connect to the 'net by using the Smoothwall box as a gateway. In a single stroke, Smoothwall removed my need for a modem on my Linux box, and my need to go through the drudgereous process of configuring PPP (with all that obnoxious routing) every time I changed configurations on my Linux box. Smoothwall fixes its own PPP in its installation, and every other box uses its PPP.
The results were almost instantaneous. I began surfing the net on my Linux box. Using a spare email address, I began dabbling with Kmail, and found it to be as good as Eudora Lite. Because of Smoothwall, I was spending much more time on my Linux box. Because of that extra time, I was learning the tips and tricks that would make me productive on the Linux desktop, and one by one knocking down those "showstopping" objections I had.
And I learned that although Dia will never replace Micrografx Windows Draw for making cartoons, or for making graphics like you see on the front page of Troubleshooters.Com, Dia is five times easier for making diagrams than is Micrografx Windows Draw. And the finished product looks better, with less "jaggies". Even though I was still authoring Troubleshooters.Com on Windows, I began making all my diagrams on my Linux box. The Linux desktop was slowly working its way into my business.
I bought the Dexxa Mouse on 1/6/2001 and installed it 1/7/2001. It made my mouse performance even worse. The Linux mouse was as bad as a notebook computer's stick mouse or thumb pad, but without even the remedy of plugging in a different mouse. I was preparing to post an email to LEAP explaining that I was abandoning the Linux transition because I couldn't accomplish anything with such a poor mouse system. But as I was penning that email, I stumbled across a little known option to go in XF86Config-4:
Option "Resolution" "1600"That option goes with all the other options in your mouse pointer section. It sped up the mouse without resorting to jerky, quirky acceleration. I deleted my intended post, and instead wrote a long, rambling post about the fix, and how great Linux is. I had re-learned what I'd forgotten so many times before -- that Linux is so configurable and so powerful and comes with so many great tools, that you can make it do anything you want.
In fact, this became a valuable lesson in Linux advocacy (on how not to do it :-), and formed the backbone of this issue of Troubleshooting Professional Magazine. My business has been running for 19 consecutive years, so it should be obvious that I'm the one who knows my business practices, and how they're best implemented on a computer. The skeptics were great technologists, but it was incredibly obvious that they had never run a business.
In our advocacy, we need to address the fact that the guy with the business is the ultimate expert on that business, and if he says he must have features A, B, and C, we must listen very carefully, and not grandly wave aside his need and tell him he can make due with features D, E and F. When advocating to a business, we must walk a mile in the owner's or manager's shoes.
This is an important point for advocates. Until the business person is convinced that there are high quality alternatives in Linux, he has a DUTY to be skeptical. Loss of access to the data probably leads to bankruptcy. Slow or awkward access to the data drops profit. The business person's fears, no matter how unfounded you know them to be, must be understood and respected. Only then can high quality Linux alternatives be found.
The following table lists my fears uncovered in desktop experimentation, together with the eventual resolution of those fears. You might find it helpful.
My Fear | The Resolution |
PPP hard to configure. | Smoothwall. |
Cut and paste was not consistent across all apps. | I learned the little tricks to yield great productivity. It took me maybe a couple weeks. |
In many apps the main menu wasn't accessible through Alt keys. | I learned hotkeys to work even faster for my often used features. |
Mouse was pitifully slow, and cranking up the acceleration just made it even more difficult to use. | Insert following in proper place in XF86Config-4 (XF86Config if you
don't use XFree86 version 4):
Option "Resolution" "1600"See www.troubleshooters.com/linux/quickhacks.htm#mouse for details. |
No equivalents for Micrografx Windows Draw vector drawing program. | For diagrams, the GPL Dia program is vastly superior to Windows Draw. For other types of drawing, I've continued to use Windows Draw on my Windows box until I can find a good Linux based alternative. |
No equivalents for Microsoft Word. | For books, although in the future I'll probably replace MS Word with a styles-based program -- maybe an XML based one. But for now I continue to use Word to write new books and edit old ones. For all other documents I use Kword, Abiword, Star Office, Lyx, Netscape Composer, or VI as appropriate. With word wrap enabled, VI has functionality similar to CPM Wordstar. |
No equivalents for Paint Shop Pro version 3 pixel painting program. | This is absolutely false. Gimp does everything Paint Shop Pro 3 did,
and a lot more. Although Gimp is very complicated to use, it's fairly simple
to do the things I used to do in Paintshop Pro version 3. For details,
see my article titled The Lazy Man's Way to Linux Screenshots at www.troubleshooters.com/linux/scrshot.htm.
You may wonder why I was using 4 year old version 3 or Paintshop Pro. It's because with version 4, they removed the stretchable rubberbanding feature, which was essential to my work. In response to my query, they said that they couldn't support both stretchable rubberbanding and multiple selections. Funny, Gimp supports both :-) |
I couldn't get Hylafax faxing software to work. | Yes, but my Winfax Pro software doesn't work either (on Windows, of course), and I paid a lot of money for that :-) |
Nothing as good as Eudora Lite for email. | KMail is better than Eudora Lite. |
Netscape Composer buggy and slow under Linux. | Once I learned the hotkeys, cut and paste tricks, and a few other goodies, it was almost as fast to author under Windows. Netscape Composer has a bug whereby it can't make relative links in the directory picking environment, so when I'm done with each document I run a quick VI script to make all absolute links relative. |
As you can see, I had many fears associated with switching from Windows to Linux. And rightfully so. 16,000 data files are nothing to fool around with. But, as you can plainly see, most of those fears vanished once addressed by Linux tools and techniques.
This is an important lesson for all contemplating a Windows to Linux transition. You will find many ominous problems on first research. But once you spend a little time with Linux, most of these problems vanish.
When in Linuxland, do as the Linux people do. Solutions are often found outside the program appearing to have the problem. Linux is extremely configurable, and comes with tools such as VI, awk, sed, find, grep, and Perl. With those tools, a "power user" type of person can get past most problems. And for those who aren't "power users", one day of a consultant's time can probably do the trick. Contrast this with what you need to do to get past Windows problems :-)
Now, for the first time, I was doing my day to day business on Linux. Slow and awkward at first, I gradually picked up those essential little productivity tricks, just as I had done with Windows so many years before. In most regards, Linux proved more than up to the task. And in those few places where Linux appeared to underwhelm, I quickly found different ways to accomplish the task. Indeed, after 10 years in Windows, and 4 years before that in DOS, I was quite guilty of "Windows Thinking". The month of actually working in Linux changed all that.
Over and over again, VI was my friend. Whether it was quickly finding something in a mailbox, or working around a bug in Netscape Composer, VI came to the rescue again and again. I learned configuration tweaks to speed my work. Finally, at the end of the month, I used Kmail instead of Eudora to mail out announcements of the March issue.
The time had come to go. I waved good-bye to Bill Gates and began my conversion.
I ordered a 60GB 7200 RPM IBM Deskstar to serve as the system disk, planning to make the existing 20GB the data disk. I bought an extra fan, and I bought a Hermanator hard disk heat sink with fans. On March 12, 2001, I took out my screwdriver and began my conversion.
I was slow and deliberate. I must have done 3 or 4 trial installs. I probably repartitioned 10 times. I had disk problems that I finally solved with a BIOS upgrade. I switched hard drives between IDE ports. I experimented all day.
The next day, March 13, I installed Linux, emailed my crew at LEAP telling them I'd be off the air for awhile (implying maybe days), and performed my final backup of the Windows machine, in triplicate. Then I Samba'ed my data to the new box, fired it up, and performed tests to make sure it worked properly with my data. I configured my Kmail filters and mailboxes as a clone of those in Eudora, and began using email that evening. I had been off the air for around 10 hours.
The next day, March 14, I took my first Linux system data backup so that a major data glitch wouldn't knock me all the way back to Windows. It involved rearranging of my data, then creating a backup script to tar.gz all my data. Because my CD burner is still on the Windows box, I burned the CD's there. Now a data failure simply meant restorral from the Linux box's data backup. I'm in Linux to stay.
I spent Thursday sort of goofing around with my new system. Friday I fixed all the multiline filenames in the data tree, used find, grep and VI to make scripts to fix the case of common extensions (i.e. change .TXT to .txt), and converted DOS line terminators to UNIX type line terminators throughout my data tree for major extensions such as html, htm, txt, c, h, cpp, java, pas, and the like. Throughout all of these changes, I backed up frequently to rewriteable CD.
By 3/20/2001, it was business as usual here at Troubleshooters.Com.
More importantly, I still haven't found a suitable replacement for MS Word in writing documents more than 30 pages -- especially books. Nor have I found a suitable replacement for Windows Draw, for non-diagram type drawing. Diagrams are handled by the superior Dia GPL package.
All of this is OK. Linux is progressing rapidly, so I'd expect that sooner or later there will be a Linux tool superior to MS Word for writing books. And with the advent of XML and the Scalable Vector Graphics (SVG) language, I highly suspect that a replacement for Windows Draw is coming down the pike. But I'll probably use Windows Draw for years to come on old drawings, because Windows Draw stopped all development years ago, and they don't export to a usable vector format. And because Windows Draw, good as it was, was just an also-ran in the graphics world, I've found no Linux packages that could import .drw files.
So my Windows box sits in the corner, an appliance to run legacy software. In my opinion, that's the ideal use of Windows.
This article gives tips on how best to approach some of these issues.Ban proprietary upgrades and new proprietary programs immediately Segregate your data from your apps and OS Make a transition plan Find dead ends in your plan, and find and ways out of those dead ends Create a practice setup Get hardware ready to accept Linux Install Linux Do final windows backup Samba data across to the Linux box Test to verify a working setup Do a backup of your Linux data Rearrange directories to suit new environment Delete or rename DATA filenames with spaces Correct the case of DOS extensions Convert DOS format text files to UNIX format Back up Install Samba and VNC so you can use Windows programs for which there's no substitute Continue to seek substitutes for remaining Windows apps
A huge issue is brewing with copy protection. I've heard a vague and unsubstantiated rumor of the existence of some modern proprietary software that requires a new "key" from the vendor each and every time you reinstall. According to this rumor, they charge significant amounts (like $35.00) to read you that key over the phone. Every app, $35.00 (or whatever) Get a virus? $35.00. Disk crash? $35.00. Windows meltdown? $35.00. Motherboard dies? $35.00. Windows registry got too big? $35.00. And in 5 years, do you think the vendor will be willing to sell you the key? Or will they force you to upgrade all those apps you use just to get at ancient data? And of course, those new versions might convert your data to a format not importable into any other software. All this is rumor, but my reading of the text of the UCITA proposed legislation tells me that it's all perfectly legal and enforceable under UCITA. The software vendor wouldn't really do this to their customers, would they? I don't know -- how do you think Microsoft would handle such a situation? Would you bet your business on it?
You can't switch all your apps at once. You must continue to use your Windows box as an appliance to run a few apps. You need to be able to install and reinstall that appliance and those apps over and over again in years to come.
Stop upgrading and buying proprietary apps today!
Can you imagine how horrible it would be to have various Windows system files scattered throughout the data on your new Linux desktop? Segregate your data before the transition.
Although it was written 6 months ago, the June 2000 Troubleshooting Professional Magazine provides an excellent example of a transition plan. For the most part it's the exact plan I executed in my transition. Read it!
Be sure the Windows backup is in a common format readable in Linux and most other operating systems. Linux is wonderful, but we all might be on something even better five years from now. Make sure the backup includes an uncompressed version of the program used to archive and compress the data. That way, no matter what happens in Linux, you can always format up a Windows box, restore the files, and Samba them over.
Make sure you use good, reliable, ubiquitous media. It's very realistic to assume you'll need this data seven years from now. I once needed a 13 year old file from my Kaypro, and couldn't find it. Nor could I read my old Kaypro diskettes -- the Uniform CPM diskette reader appears not to work with modern DOS.
As shown later in this article, there are automated ways to convert DOS text to UNIX text for specified extensions, and these methods are actually safer than the conversion in FTP.
Plan to back up often in the days following your transition. Back up before and after you handle multi-word filenames. Back up before and after you change file extensions to the correct case. And of course, back up before and after you convert DOS text files to UNIX format. The rule of thumb is to back up before any work that could cause file corruption or file loss, and to back up after any large pieces of work that change the data.
Once your data is in the form you'll use on a daily basis, you can resume your normal backup schedule.
find /d -type f | grep "Copy of" | xargs -P10 -n1 rmSure, that's easy and quick, but it's easy to make a mistake, and the stinky stuff will really hit the fan if you make a mistake. If the grep command had a -v, the preceding command would delete everything but the "Copy of" type files.
So instead, I redirect a list of the files to a script. From the tweak directory, I'd run the following command:
find /d -type f | grep "Copy of" > danger.shNow file danger.sh contains a list of every file in the tree starting with "Copy of". You can quickly peruse it and see if it contains any files you really don't want to delete. If so, delete the file's line from the file. Next, use VI to turn it into a script. In the case of a deletion, it's as simple as this inside VI:
:%s/^/rm -f /After making the file into a script, eyeball it once again to make sure it does what you want, and then save it and make it executable (using chmod). Now you can run the script knowing exactly what it will do.
During my conversion, I went the xargs route for awhile. It didn't really save much time, because I had to check, double check, and triple check every command. There's no second chance with a single command mass-deletion. Half way through the conversion I decided it was worth the extra effort to create scripts enumerating each file to convert. That's what I recommend to you.
With apologies to the classic "Real Men Don't Use Pascal", creating a file by file script to do a mass conversion or deletion is "What you see is what you get", while using a single command containing xargs is more "You asked for it, you got it". I'm just not man enough for the latter. That's why every deletion and conversion in this article uses a file by file script rather than a single command.
One more thing. I always call those scripts danger.sh. Why? Because they're highly dangerous, and anyone tempted to run them needs to know that. Delete such scripts when you're done with your conversion.
find /d -type d | grep " "Fix as appropriate. There probably won't be too many.
Now find out how many files have spaces in their filenames:
find /d -type f | grep " "If there are just a few, fix them manually. But if there are more than maybe 10, it's best to go semi-automated. The first step is to see how many are infamous "Copy of myfile.txt", or "Copy (3) of mydoc.doc", or the like. Your mileage may vary, but I decided to delete all of those. Why? Because they were almost certainly made as a temporary measure. If I really wanted to keep a version of a file, I would have named it differently. And of course, I still have the file on my final Windows backup.
So from your tweak directory you can do this:
find /d -type f | grep "Copy of " > danger.shEdit danger.sh with VI, peruse the list, and make sure you want to delete them all. If you run across one that shouldn't be deleted, delete its line from danger.sh. Now prepend rm -f to every filename:
:%s/^/rm -f /Look at the file one last time to make sure it does what you want, and if so, save it and run it. All those obnoxious "Copy of" files will be gone.
You can do the same by following the same procedure for those pesky "Copy (3) of" files with the following:
find /d -type f | grep "Copy (.) of " > danger.shNow that you've hopefully deleted most of your space containing files, you're ready to tackle the remainder. Obtain a list of all files containing spaces in the filename:
find /d -type f | grep " " > danger.shLook at the list. You may wish to leave many as-is. For instance, if you have a hundred multiword filenames in a directory, and they all pertain to a Windows program, and you won't be using them much in the Linux environment, you may wish to leave them as-is. If so, delete them from the file. But be aware that multiword files can mess up directory tree based processes.
Look at the file, and decide whether you're going to delete most of them or rename both of them. What you will do is make the file into a script to either delete or rename, and then change those files that are exceptions. In my case, most of them got renamed because I didn't have time to view them before deletion. So I made a rename script using VI:
:%s/\(.*\)/mv \1 \1/The preceding command says "take the entire content of the line and store it in \1. Now replace each line with "mv ", then \1 and a space, and then \1 again". The result is that each line is a move command whose source and destination are identical. Next you go line by line to each destination and change it to a filename with no spaces. Remember, if you see a file that you want to delete, simply remove its destination and change the mv to rm -f.
Filenames containing spaces can really mess up commands, especially mass-file commands. They require gratuitous use of quotes. They're ugly. Once you've eliminated multi word filenames from your data (once again, not from files pertaining to your Linux or Linux app installation), it's time to fix file extension cases and convert DOS format text files to UNIX format.
Start with a list of all the .txt files on your system:
# find /d -type f | grep -i "\.txt$" > danger.batIn the preceding command, note the -i arg to grep. That makes the grep case insensitive, yielding .txt, .TXT, .tXt, and any other case combinations. Note also that the regular expression escapes the dot with a backslash, because otherwise the dot would mean "any character". Note also the dollar sign on the end, which stands for end of line. If the .txt isn't the final thing on the line, it's not a .txt file. You certainly wouldn't want to rename a file like myreport.TXT.tar.gz.
So now you have a list of all the files ending in the upper or lower case letters t, then x then t. You certainly don't want to rename any files that are already lower case, so delete them from danger.sh with this simple VI command:
:g/\.txt$/dThe preceding command deletes every line whose extension is already lowercase .txt, while leaving those whose extensions are not entirely lowercase.
|
Now convert the file into a script to rename the files with lowercase extensions. First make move commands like this:
:%s/\(.*\)/mv \1 \1/Now every line is a move command with destination identical to source. Change the destinations to lowercase as follows:
:%s/\.txt$/.txt/iThe dollar sign on the end of the search string matches "end of line" and therefor prevents this from changing the source extension. The i on the end of the command makes the search case insensitive, so that it will find .TxT at the end of a line, and change it to .txt.
Examine the file, make sure it does what you want to the files that need it, and if so, save it and run it. The cases of all your .txt files will be lowercase. You can do this with other common Linux extensions, such as .htm, .html, .c, .cpp, .h, .java, .pas, .pas, .pl, .py, and the like.
UNIX ends text lines with just the linefeed (Ctrl+J, ^J, octal 012, decimal 10).
Sometimes it doesn't matter. For instance, VI handles both DOS and UNIX formatted files exactly how you would want. But most UNIX utilities and apps that read a line grab everything up to the linefeed, meaning they deliver a line of text with the Ctrl+M tacked on the end. This usually causes processing errors.
You haven't lived until you debug a formerly working CGI script that, unknown to you, has been saved as DOS and FTP'ed binary up to a UNIX server.
So to make a long story short, you must attempt to make as many of your text files as possible comply with the UNIX convention, now that you're on a UNIX box. But that must be balanced by safety concerns.
Imagine a data file that is not line based. Maybe it's fixed record length. It has Ctrl+M characters as legitimate data (maybe they represent a value of 15 in a byte field that can contain a value between 0 and 255, or maybe part of a 4 byte integer). Imagine deleting all the Ctrl+M characters from that file. As you'd imagine, its app would malfunction. So instead you delete only Ctrl+M's that immediately precede Ctrl+J's. So maybe one record in the file has that sequence, and its Ctrl+M is deleted. Now when you run the app, it works perfectly on all records above the one with the deleted char, but it malfunctions on that record and all below it. If you're really lucky, the program will segfault when it hits the bad record. But more likely it will simply shift everything left one byte, and output numerically wrong data. And maybe segfault on that final short record (or maybe not).
As if all of this isn't enough concern, you must make sure the conversion does not alter the file date. After all, if you didn't care about the file date, you would have used FTP instead of Samba to move the files from the Windows box to the Linux box, because many FTP clients can text convert by extension. But you wanted to preserve that filedate, so you used non-text-converting Samba.
The preceding discussion introduces the fact that EXTREME CARE must be taken when converting from DOS text to UNIX text format. It's better to forgo converting 100 legitimate text files than to wreck one binary file. But leaving all text files in DOS format is not an option, as it would slow your work for years to come. So you examine the tradeoff between automation and safety, and do your best. This article describes what I did.
I used semi-automated methods. Basically, I used the find command to assemble huge lists of files, and converted them to scripts that call a conversion program for each file. I used several safety measures:
So let me now introduce the single file conversion program, a shell
script called crlflf which I put on the path. The explanation
of the program follows its code:
#!/bin/sh # NO WARRANTEE, USER IS RESPONSIBLE FOR ANY DAMAGE CAUSED BY THIS SCRIPT log=crlflf.log errlog=crlflferr.log testfile=crlflftest.txt original=$1 dosname="$original.dos" if test $# -eq 1; then ##### No wildcards allowed: safety cat -v $1 | grep -q "\^M$" if test $? -ne 0; then ##### If already UNIX format, do nothing msg="$(date +"%Y/%m/%d %H:%M:%S") : \ Already Unix file, no action taken: [$original]" echo $msg; echo $msg >> $log else ##### If not UNIX format, convert if text #*** Files with chars other than *** #*** space-tilde, ^M, ^L, ^Z and Tab *** #*** are considered binary *** #*** and therefore should not be touched *** sed -n -e '/[^ -~^M^L^Z\ ]/p' $original > $testfile if test -s $testfile; then ##### If binary, don't convert msg="$(date +"%Y/%m/%d %H:%M:%S") : / ERROR: Suspected binary file [$original]" echo $msg; echo $msg >> $log; echo $msg >> $errlog else ##### If not binary, convert mv -f $original $dosname cat $dosname | sed -e 's/^M^M*$//g;s/^Z//g' > $original touch -r $dosname $original msg="$(date +"%Y/%m/%d %H:%M:%S") : Converted $original" echo $msg; echo $msg >> $log fi fi else ##### If multiple args, do nothing msg="$(date +"%Y/%m/%d %H:%M:%S") : / ERROR: Wrong # of args, expect 1, got $# from [$@]" echo $msg; echo $msg >> $log; echo $msg >> $errlog fi |
cat -v $1 | grep -q "\^M$" -- cat -v outputs the file after changing all control characters to a carat (^) followed by the letter. This command tests for the presense of a carriage return at the end of a line, indicating a DOS formatted text file. Please remember that in this particular command, the carat and the letter are 2 separate characters!
[^ -~^M^L^Z\ ] means a line containing any char not in the space through tilde printables or a carriage return (^M), a page eject (^L), or a DOS EOF char (^Z), or a tab character (^I, which shows up as a tabstop of whitespace in a default configured VI editor). Please remember when cutting and pasting the preceding code that you must change the ^M to a real Ctrl+M by typing Ctrl+V followed by Ctrl+M. Same goes for the other control chars.
s/^M^M*$//g;s/^Z//g means delete all sequences of carriage returns occurring at the end of a line, and delete all Ctrl+Z characters. Please remember when cutting and pasting the preceding code that you must change the ^M to a real Ctrl+M by typing Ctrl+V followed by Ctrl+M. Same goes for the Ctrl+Z control chars.
In the preceding script, all filenames and filename components are defined as variables. First it's tested for multiple arguments, and exits with an error if there are multiple arguments. The reason is simple. Wildcards expand to multiple arguments, and nobody wants someone issuing the command crlflf *.
The next test checks if it's already in UNIX format, and if so, the file is not touched. It might have been easier to program and faster to let the program harmlessly copy the UNIX file, but for safety's sake, we touch only what we must. The test for DOS uses cat -v to express control characters as printable 2 character equivalents, starting with a carat (^). So in this particular command, ^M is two separate characters. If you don't like this, I'm sure you can find a substitute using sed and real control characters.
After that, it's tested to see whether it's binary or text, and if binary, an error is issued and the file is not touched. This test uses sed to build a file comprised of all lines containing any character not in the set [space-tilde, Tab (^I), Carriage Return (^M), Linefeed (^J), Pagefeed (^L), DOS EOF (^Z)]. If the file considered for conversion has only those characters, it's very probably intended to be a text file. Otherwise, there's a significant likelihood it's intended to be a binary file.
The conversion itself simply deletes any Ctrl+M at the end of a line
(that is, immediately before a Ctrl+J), or any consecutive run of Ctrl+M
characters at the end of a line, and also deletes the Ctrl+Z characters,
which are probably used only as an EOF marker in programs operating under
very old versions of MS-DOS. You might wonder why I delete consecutive
runs of Ctrl+M at the end of the line. I do it because as files are FTP'ed
both ways between UNIX and DOS, often incorrectly, very often several Ctrl+M
characters are piled up before the Ctrl+J. Now that you're presumably not
going to be doing much exchange with a Windows environment, it's a good
time to clean up that mess for good.
# find /d -type f | grep -i "\.htm$" | xargs -P10 -n1 crlflfAs I said earlier in this article, "you asked for it, you got it". Actually, as far as I know nothing went wrong. As far as I know :-).
Don't repeat my mistake. Your data is too valuable. Do what I learned to do:
# find /d -type f | grep -i "\.htm$ > danger.sh # vi danger.sh : %s/^/crlflf /And then you examine your script, make sure you won't be doing anything horrible, and save and exit. But don't run it yet! If you remember, the crlflf script writes to log files crlflf.log and crlflferr.log. These two files, especially the latter, are absolutely vital to cleaning up after running your script. So be sure to delete them before running your script. You don't want ghosts of runs past confusing you in the cleanup stage.
So now run the script:
# ./danger.shYou'll see all sorts of messages scrolling down your screen. Hopefully most will say "Converted <filename>", and most of the remainder will tell you that no action was taken because it was already a UNIX file. But it's likely you'll get some errors, mostly because crlflf considered the file binary. Look at the files listed as binary in crlflferr.log.
Try the following command on each one:
cat -v myfile.htm | \ sed -e 's/\^M//' | \ grep "\^" | \ less |
The preceding command converts unprintables to printable representations using the carat (^) character, and then filters it through grep to print only the offending lines. Then look in VI and search for the same lines. In many cases what you'll find is a single character represented by a tilde followed by a letter, such as ~V. These are characters outside of normal ascii, usually above 126 (tilde). I believe they were inserted by Windows programs to represent things like trademark symbols, non-breaking spaces (which are often better handled as , em dashes (which I like to represent as two hyphens -- easier to work with in text editors). Most can be deleted or replaced with regular ascii. In such cases you may wish to update the filedate. If so, instead of running crlflf on the file, simply execute the following VI command before saving:
:set fileformat=unixThe main point is you've done the easy 95% of your conversion in an automated fashion, leaving you time to spend more time on the files requiring a tough call.
In my personal dealings I've noticed that ruthlessness, not pride, comes right before the fall. When the guy who scares you most gets the most scary, typically the next couple years someone will take him out of the picture.
History certainly backs up this theory. Remember when Richard Nixon used the military to block access to the evidence bound to impeach him? Many thought he would take over the country and become king, but in fact he resigned in disgrace less than a year later. My favorite historical instance occurred during the fall of communism, when retrograde communist hard-liners attempted a coup against Mikhail Gorbachev, it looked like the bad old Soviet Union would carry the day. But Boris Yeltsin led his followers from atop an armored car, demanding and getting Mr. Gorbachev's return, thus ending the coup.
In hindsight, it's obvious why ruthlessness precedes a fall. The powerful once had something going for them -- some attribute over and above brute force propelled them above the much stronger competitors of their youth. But after years of ruling by force, force is all they know -- all they can respond with, and ultimately force and only force is never enough.
In "MS To Users: Pay Up", Mitch Wagner describes how Microsoft is cracking down on customers whom it feels aren't paying for all the software they're using. Microsoft now demands narrow interpretation of licensing language, and often calls for audits to prove it. Wagner's article tells of a bank who couldn't prove they had licenses for some of its systems, and had to pay an extra $10,000.00.
This article tells how Microsoft wanted full price for thin client usage from Alaska Airlines, and how Alaska Airlines wrote its own app instead of paying Microsoft the extra $250,000.00.
Most telling was the article's explanation for Microsoft's crackdown. Although Wagner said it more nicely, the basic story line is that Microsoft's earnings are now flat, and they have to get the money somewhere, so they're squeezing it out of existing customers. That would be a sound business practice if they were a total monopoly, but with cracks in the monopolistic wall, including Linux and writing your own app, Microsoft's latest move might turn out to be their last moment of power.
A common theme in futuristic fiction is society or a large body of humans going "over the top" in their rules and practices. George Orwell's "1984" is certainly the poster child for this genre. Ira Levin is a frequent creator of such stories, including "This Perfect Day" and my personal favorite, "The Stepford Wives". All such stories share the common trait that the group's behavior is explainable by motive, but unbelievable within the bounds of current human society.
Andrew Orlowski's "All your data (and bizplans) are belong to Microsoft" would be in the same genre as "1984", "This Perfect Day" and "The Stepford Wives" but for one small detail -- it's true. Its truth can be verified by the Microsoft Passport user agreement, whose URL is http://www.passport.com/Consumer/TermsOfUse.asp.
Basically, Orlowski reveals that the ""Microsoft's Passport service
"Terms of Use and Notices" gives Microsoft the right to:
|
|
And it also brushes aside the copyrights and trademarks of the owners
of the material traveling through Passport with the following language:
The foregoing grants shall include the right to exploit any proprietary rights in such communication, including but not limited to rights under copyright, trademark, service mark or patent laws under any relevant jurisdiction. No compensation will be paid with respect to Microsoft's use of the materials contained within such communication. Microsoft is under no obligation to post or use any materials you may provide and may remove such materials at any time in Microsoft's sole discretion." |
|
The first paragraph of Orlowski's paragraph sets the tone:
With Microsoft's HailStorm .NET initiative hinging on the company's very own PassPort service, you'd think Redmond would be bending over backwards to stress the confidentially of user information. |
|
Hey -- wasn't .Net supposed to be "cross platform"? Well, I guess it is, but according to this article it all goes through the Microsoft PassPort website. A website, which, incidentally, can harvest and use any of that data and use it as Microsoft and its "affiliated companies" see fit. Orlowski ponders the safety of your business plan if it's emailed to a coworker and that email goes through PassPort.
Hey -- isn't that paranoid? Microsoft isn't your competitor. Well, um, that very statement once applied to the likes of Netscape and DBase. Personally, if I were in the roofing business I wouldn't feel safe from Microsoft's competition.
And as the operator of Troubleshooters.Com, I want Microsoft to know that just because somebody views my website through PassPort or .Net or Hailstorm doesn't mean Microsoft owns my pages, and if I send an email to someone and that email goes through PassPort, Microsoft has no right to what I put in the email. So I changed the Troubleshooters.Com copyright statement to reflect the new realities. But that's discussed in another article.
My first response to Orlowski's article was disbelief. It's the disbelief of "1984", "This Perfect Day" and "The Stepford Wives". My next emotion was a cold, calm, slowly growing hatred -- accompanied by the sure conviction I must do something about this.
Obviously not really an article, but instead an entire magazine. In fact, the magazine you're reading. If you view the Wagner and Orlowski articles as the threat, you can view this magazine issue as the response to the threat.
When you advocate Linux, mention these three articles together. When you write your congressman or senator or president, mention these three articles together. Let them know you don't vote for people so dense they don't understand the heinous nature of Microsoft's actions.
One might think that Wagner, Orlowski and I got together to "simulate an outpouring of "grass roots support"" of Linux. After all, according to an article on the front page of April 10, 1998 Los Angeles Times, that's just what Microsoft did (paying for it as an "out-of-pocket" expense). And indeed, I'll be approaching Wagner and Orlowski about running coordinated articles in the future. But no, we all just came out with similar themes within a day of each other. Probably because Micrsoft's increased ruthlessness is one of the most visible trends.
And of course, we all know what ruthlessness leads to.
It doesn't take a rocket scientist to see that in order for to protect myself, I must prohibit all viewing and quoting of my material by any technology taking that data through the PassPort server. And I believe that includes .Net and Hailstorm. You can see view my new copyright.
My life will become interesting when Microsoft Internet Explorer starts depending on PassPort, at which time it will be illegal for some 80% of the Internet users to view my website. Could that mean what Judge Jackson meant by "monopolistic power"? But my fate will be far kinder than most. I'll still own my data. Pity the poor lambs who figure "oh, Microsoft won't really enforce that part of their contract". I'm betting Microsoft will.
If you own any type of what's typically called "intellectual property", I suggest your copyright statement and all licenses contain verbiage preventing the viewing or transmission of your materials through servers, services and protocols that claim rights to your data.
Those days are gone. In 2001 we all know about each other, we've formed a community, and we have a very viable product. We need to move that product into businesses large and small.
The people who sign on the dotted line are business people, not pre-existing Microsoft haters. To these business people, the very words our community employed in its formation are credibility killers. The business person doesn't want to hear that we hate Microsoft. He doesn't even want to hear WHY we hate Microsoft.
But he just might want to hear why HE should be very afraid of Microsoft, if that message is coupled with a realistic, executable plan to remove the risks associated with Microsoft.
So just for today, I'd like to talk with you as a business person, not as a Linux advocate. I'd like to tell you what advocacy worked on me, and what advocacy fell on my deaf ears. So let's sit down, you and I, and chat, business person to Linux advocate. Pull up a chair -- let's talk.
But I'm a tiny information business. Surely that couldn't be true of a giant bricks and mortar company -- let's say Wal Mart. Let's explore that. What would happen to Wal Mart if they lost all their data and couldn't retrieve it within a couple weeks? What if their famed inventory system failed? I'm not an expert on Wal Mart, but I'd imagine they'd go belly up. That's why they, and almost every other major company, carefully safeguards their data. A large company is more at risk than I am. If I irretrievably lost all my data I'd just go out and get a job. I'd be out the $5,000 for the company's physical assets. But if a Wal Mart, or a Macdonalds, or a GM went bankrupt, it would cause complete economic disruption for tens of thousands, and maybe for the country.
A company's data, and its access to that data, is crucial to its survival.
So when people implied I was a "hypocrite" for using Windows while advocating Linux, it fell on deaf ears. When they implied I was "chicken" for not switching immediately, I ignored them. I had a plan, I was executing that plan, and if I finished it a few months late, who cares. As a business person, I found absolutely no credibility in the people advising me to "do it and do it now". They may have been great Linux gurus, but it was obvious they had never run a business. Because anyone with any kind of ownership or management position in their past knows that the business's data must be protected at all times, and any transition must be done according to a well thought out plan. When it comes to data, shooting from the hip means getting your head blown off.
Believe it or not, I was occasionally flamed and labeled some sort of Windows guy when I asked "how do I work around this?" or "how do I find a Linux equivalent for that?". Visualize this. I was working a transition plan, obviously highly motivated to make the switch, and people flamed me because I wanted my business processes, at least in the near term, to remain stable through the transition.
Because I'm a Linux advocate and committed to Linux, I let all those insults roll off my back. But for every one of me, there are a hundred business people, with no OS ax to grind, who just want the best for their data and their access to that data. How will they perceive an operating system whose self-proclaimed proponents insult them while proudly displaying a complete lack of business understanding?
To summarize, the worst possible thing is to tell a business person he should convert to Linux right now.
There are other approaches that don't work. Emphasizing price is usually a bad ploy. Realistically, the cost of a conversion to Linux could outstrip several years of proprietary software licenses.
Obviously, don't mention that it's the same blind obedience to Microsoft which the company is displaying, that caused this whole problem in the first place. Business people aren't in the business of doing things for the benefit of society. If they were, their shareholders -- you know, people like you and me -- would fire them.
Don't argue with the business person. If the point of contention is business related, they're almost certainly right. And if it's a purely technological point, you can illustrate the truth without arguing.
And the skillful Linux advocate will certainly find a way to have the business person articulate the fact that most of these problems are intentionally caused by Microsoft, and that the problems will continue to get worse, and that today's conversion pain could save considerably more pain down the road, when Microsoft is basically the landlord for the corporation's data -- the only landlord possible due to license provisions and unbalanced legislation, and the landlord's response to every request is "if you don't like it, go somewhere else".
You need to get the business person voice what you know to be true -- companies that stay too long with Microsoft will be severely damaged.
Study IBM. The average business person respects IBM. Know IBM's Linux offerings, their official Linux position. Be ready to respond to someone knowledgeable brings up the fact that vast parts of IBM use no Linux. Understand Websphere as an alternative to .Net.
The tried and true "get a server in the door" technique is still the best in most cases. As I related in the rest of this issue of Troubleshooting Professional, converting to a Linux desktop can, at times, get rather dicey. It's worth it, but it's not an easy sell. A Samba or Apache server is a much easier sell. Sure, the company needs Win2000 for their MS SQL Server (which they need for their specialized apps, even though it's almost trivial to design an app portable between DBMS's). But when their data files outgrow that Win2000 box, they don't need to buy another Win2000 box. A simple Samba server with RAID will do the trick. And if the subject of Win2000/Samba incompatibility comes up, as far as I know, Samba works with any client using the SMB protocols, and you can't turn off SMB unless every single client is Win2000. If there are any Win9x clients, SMB must be enabled, and that means Samba will work for everyone.
Web servers might be a little harder. Many businesses have been sold a bill of goods that development is much easier with IIS and ASP than with Apache and PHP. But sometimes an Apache server is just what the business needs.
Sometimes you'll hear objections to Linux you think are just plain bogus. Fact is though, they're life and death to the person contemplating the transition, so it's imperative that they be taken seriously. Look at some of the problems I had with Linux in the early stages of my transition, and the solutions I found:
Ask the business person if she's really read the contents of her license agreements. About the true effect of the non-reverse-engineering clause, which is to trap her forever in that software. About the clauses concerning publishing negative information about the software's performance. Why does the truth scare them so much. Ask why it was so important to the drafters of UCITA to allow the software manufacturers to absolutely and completely absolve themselves of all responsibility for any and all bugs. You might want to carry a license in your pocket, with the scariest clauses marked with highlighter.
The tendency of proprietary software is increasingly to lease back to you access to your own data. Ever so gently, ask the business person whether she thinks Bill Gates would be a cooperative and beneficial landlord.
And politely mention that UCITA's enforcement of licenses' no-reverse-engineering clauses is like a lifetime lease, allowing the landlord able to raise rents and let the property degenerate, but not allowing you to move.
Your challenge is to put across the point that yes, it really is that serious.
Today every Microsoft hater knows about Linux, and most are vehemently pro-Linux. Further advocacy to Microsoft haters is beating a dead horse. Now we millions of Microsoft haters need to advocate to the business community, who so far have heard only Microsoft's slick and glossy side of the story. These are business people -- the guys we used to call "suits" -- the guys who used to call us "techies".
We need to listen to the business people -- really listen to their fears and problems. We need to suggest business practical ways to migrate to Linux, and we need to get these people to articulate their fears and frustrations with Microsoft and proprietary software in general.
By submitting content, you give Troubleshooters.Com the non-exclusive, perpetual right to publish it on Troubleshooters.Com or any A3B3 website. Other than that, you retain the copyright and sole right to sell or give it away elsewhere. Troubleshooters.Com will acknowledge you as the author and, if you request, will display your copyright notice and/or a "reprinted by permission of author" notice. Obviously, you must be the copyright holder and must be legally able to grant us this perpetual right. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.
Submissions should be emailed to slitt@troubleshooters.com, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):