Limits of Excel as Data Tool
People do use tools not specifically designed for the task sometimes when the best tool for the task is not available or — in my case — not entirely to be trusted. I’m sure my distrust of proprietary software is slowly becoming apparent — perhaps too much so — and I might have to edit these pages to clear all that stuff out, eventually. In any case, this is why I used an Excel spreadsheet for database use. Well, one reason, anyway: we don’t have access to Access at work.
But now, we’re going to discover that using a flat-file database (which is essentially what a spreadsheet is) has its limits. The second page of data we copied to sheet is a list of our transactions. On this sheet, that first column — the transaction numbers — is made up of unique numbers. On the first sheet, that column has duplicates, since one transaction can involve a purchase of more than one item.
In order to easily manage a one-to-many record set, we’ll need a relational database. Yes, it can be done in VBA code, but here is where another shortcoming of using Excel versus using a RDBS (relation database system) comes to light: SQL. SQL stands for ‘Structured Query Language’. Excel doesn’t support it: most RDBSes do. And when you’re doing reporting, SQL makes your life SO much easier!
So, we’re going to make a change, but it will be bigger than it first appears. For reasons that will become increasingly apparent to anyone reading these pages, it is of utmost importance to me that I own freely what I will have created. So, I wish to create this in software — and in an OS — that supports that freedom. I will not be bound by some EULA that can dictate — at some point down the track, as Instagram users have discovered to their horror — what is to be done with what I create. Here are the proposed steps to take:
- Chose a libre Operating System, i.e., some flavour of GNU/Linux
- Continue development in the same essential Language — BASIC — in GNU/Linux
- Use a stand-alone data container, again, something libre — I chose SQLite
For OS, I run Linux Mint 13 Maya 64-bit. On the issue of ‘free’ versus ‘libre‘ I have addressed it thus: I have donated to the Linux Mint project and will continue to do so. Whilst they are not strictly GNU(freedom)-compliant in that they do include some non-free software in their distro, at this point it is a significant step away from the malware that is Windows. Nothing reports back to the mother ship.
For software, I have purchased the dual-platform version of Q7Basic and the 3-platform licence for PureBasic, neither of which I’ve been using at all. Instead, I’ve been developing in Gambas which is free. To this effort I have donated as well.
All this to say: I have no intention of perpetuating the notion that I’m promoting free software as to mean ‘no-cost software’. That is not the meaning of ‘free’ in this context, which is why I use the term ‘freedom’ or ‘libre‘ to differentiate and clarify that I’m espousing a cause, not being a cheapskate.
The fewest get that.
Preparing the Data
After the date conversion has been done to both spreadsheets — Itemised Purchases and Transactions — it’s time to get them ready for export to csv. So:
- Copy these sheets each to a new workbook
- Remove the columns that do nothing: the ones with ‘Reset’, ‘Complete’ or ‘View Order’ and the old, pre-conversion dates.
- Check there are no commas anywhere in any of the fields, which will stuff up the import to SQLite – replace any commas with a dash ( – )
- Save to .csv, comma delimited.
Then, in the Transactions ‘table’ (which is what they’re going to be) I’ll have 3 fields:
- PurchaseOrderNumber (unique values)
- Amount
- DateOfPurchase
…and in the ItemisedPurchases table I have 3 fields as well:
- PurchaseOrderNumber (NOT unique values… there are some duplicates)
- ItemName
- DateOfPurchase – removed
The Date of Purchase field is actually redundant in the Itemised Purchases table, so I’m going to remove it.