Thursday, December 8, 2011

How To Convert All Text To Lowercase In Google Spreadsheets

We consume a lot of table-like data from clients that has a lot of human error in it. Often this is big enough that manual fix-up isn't fast enough (more than 100) but not so big that some more elaborate solution is appropriate (millions of entries needing formulas and machine learning). It's just a static table of data that needs to get loaded into a Dictionary<string, string>.

When doing so we dump the data into Google Spreadsheets, perform some fixups, and then share it with the client to verify it's accurate. One annoyance is there's no way to just switch a column to lowercase directly, so here's one way to pull it off:

  1. Select the column, right-click, and click Insert Column Right.
  2. In the first cell of the new column enter formula =LOWER(A1) (or whatever the first cell is of the original column).
  3. Copy the cell with the formula in it.
  4. Select the column and paste - Google Spreadsheets is smart enough to adjust the row number for each pasted entry. You now have a column with lowercase in it - but it requires that column to the left to still be there.
  5. Select the new column again and copy.
  6. Right-click and click Paste Special > Values only.
This wipes out the formulas. You can now delete the original column. Obviously this applies to anything you could do with a formula in spreadsheets.


Wednesday, November 9, 2011

NoSQL - Where's It Going? Where Should It Go?

The NoSQL movement is either saving web platforms or a major nuisance, depending on what kind of developer you happen to be. Either way, the way we store data is shifting. There are the old stand-by Relational DBs like MySQL, Oracle and SQL Server, and then there's all the crazy new wave stores - BigTable, HBase, Cassandra, SimpleDB, etc, all falling under the general category of "NoSQL."

Each of these has their design features and focus, and have less in common than maybe an umbrella like NoSQL should allow. Then again, by defining them by what they are not, I suppose a ham sandwich could also be eligible for the NoSQL category.

In the simplest cases, all of these solutions allow you to get a basic job done: store rows of similar-ish data in a list. After that, things get crazy.

The first general category they differ by is performance. Some favor write performance. Some favor low-latency consistency. Some favor read performance. Some favor availability. It's unfortunate my choice of data store for my entire app impacts these factors. Really these are all strategies I'll need in varying amounts for different tasks my app performs.

The second general category they differ by is how you read and write data. Each generally has its own new-fangled API for accessing it. Some have libraries that let you pretend you're still using SQL, and tend to throw a lot of errors when you do anything interesting, let alone fancy. For example, Amazon's SimpleDB, while indeed simple, cannot handle relating data between 2 tables (which it calls domains). While it has a SQL-like interface that sits on top of its API, most of the SQL you're used to using will throw an error (like JOIN). What a nice cage to build an app inside of.

This failing of SimpleDB and some of the other NoSQL options seems to be from a bit of confusion about what NoSQL means. Although it generally means it's not a Relational Database, that doesn't mean data never has relationships. It simply means that it has no explicitly defined relationships in the database itself - really what people want is to never think about a Foreign Key again in their lives. Put another way, relationships are business logic that belong in code, not the database. Fetching data and how that's performed is the responsibility of the data provider, not the app.

It seems there needs to be a general standard for NoSQL databases that's defined by what it is rather than what it is not, and here's what I see developers really looking for:

  • Named tables
  • Allows you to submit arbitrary data into tables ("schemaless")
  • Does not enforce data relationships (not a relational DB), but...
  • Allows you to join tables in queries
  • Allows sorting and filtering
  • Scalable
That's what people are really looking for in a NoSQL database. It eliminates the upfront cost of schemas, and eliminates a lot of the performance cost of storing all those rows in a scalable way. The one big burden hanging out there is handling joins - but it's still something that can be accomplished with scalability in mind, and it can be done at the data layer so a master in the data provider can service requests for any data of any shape.

I put scalable last because the truth is that a lot of apps being built with NoSQL solutions are just hopeful. They have no need for something more scalable than a typical MySQL or even SQL Server Express instance can provide. But they do want to be done with schema management, and they want to design their app so it can handle the big time if it gets there.

There are some further features I'd like to see ideally, but don't have to be there to fulfill the basics of what these modern DBs ought to be:
  • Ideally: Allows querying with SQL
  • Ideally: Handles indexing of multiple columns, preferably in response to queries
  • Ideally: Lets you specify the strategy for a specific table's storage ("the engine" by today's terms)
  • Ideally: Handles sharding etc strategies for you so you can store all data in a single named table, even if it's broken down into many smaller tablets under the hood
I don't have a great name for where these solutions all seem to be headed. Schemaless Joinable Tables?

Monday, October 10, 2011

Google's Dart

Google's Dart looks pretty cool.
http://www.dartlang.org/docs/getting-started/

It borrows (and improves) the only thing I like from PHP - shorthand for variables and expressions in strings:
'Hello $name' replaces $name with the value of the name variable.
'Answer: ${a + b}' performs the expression a + b and swaps that result in.

It also allows both var and typed variables in the same app like Javascript and C#, and borrows one of C#'s best features, Lambda Functions:

num circumference(num r) => r * 3.14;

Finally, it doesn't have a heavy focus on making things private - nothing is private by default (you instead make something private by prefixing its name with an underscore). This is a somewhat odd syntax, but I think one of Javascript's enduring and under-recognized strengths is that everything is by default public. This makes Monkey-Patching broken-but-useful libraries possible, something that's impossible in Java and has caused enormous amounts of pain in numerous past Java projects I've worked on.

Looking forward to seeing where Dart goes next. My suggestion: Port the Closure library to Dart.
http://code.google.com/closure/library/

Thursday, September 15, 2011

Surviving Google's Blogpocalypse

I attempted to login to my Google Apps version of Gmail one day and was instead presented with a page I couldn't circumvent. Over 20 checkboxes, several tabs that didn't look like tabs, and a lot of confusing options. Eventually I was able to access Gmail again.

The next time I attempted to login to Blogger however, things went poorly. As it turns out this transition does not have a migration path for Blogger/Blogspot, so you have to migrate manually.

Migrating manually is not obvious or easy. Here's the steps I had to take - maybe they'll help others:
  1. Choose the option to create a new personal Gmail account for your blog. You'll need to go through the usual Gmail signup process where you create another username, another password to remember, and have to enter another arbitrary security question.
  2. Login with the new Gmail account. You can signout of your existing account, or do this in an Incognito window or another browser to skip the logout step.
  3. Now you have access to your blog again! ...but you don't want to have to use this other random Gmail account to edit it every time.
  4. Get to Settings > Permissions
    1. In the old look: Under the blog name, click Settings, then the Permissions tab.
    2. In the new look: Click the blog name, click Settings, and look under Permissions.
  5. Click Add Authors, and add your old Apps account - except - you can't just add it normally - the operation just quietly fails with no open invite, no indication the invite went out, no email, and no indication of an error. Instead, you need to use an alias for your account. If you have multiple domains associated with your account this is simple - use one of the alias domains. If you don't, you'll need to create an Alias for your user in the Domain Admin, then invite that Alias.
  6. Check your email for the invite, and accept.
  7. Come back to the incognito window (or, sadly, logout and login as the ephemeral Gmail account you created). Get to Settings > Permissions again, and now change your invited self to Admin instead of Author.
  8. You can finally return to using Blogger the way you always did before Google ruined your day.
I'd like to point out that this transition really should have been entirely under the covers - as a user, I login with my Google Apps email address to edit my blog. I want to keep doing so. That Google is going through a major systems changeover shouldn't require me going through all of this trouble. If the transition is going to be mandatory, it should have waited until all products could be migrated automatically. Instead a lot of non-technical users had to deal with this insane process that offers no support.

Google is notoriously poor at customer support. I actually recall standing at TGIF listening to a Googler ask the founders why customers are pushed to post their problems to Support forums no one reads, getting no response unless an employee happens to take it upon themselves to look into it, or it makes it onto the front page of Slashdot. Sergey Brin's actual response was, "Well we shouldn't resolve these issues by having a big customer service department. We should resolve them by writing better code." I really should've grabbed a mic and described a metaphorical situation in which a farmer starts closing the barn door after his cow wanders off, but alas.

This is the support thread for this problem: It's safe to assume no Google employee will ever respond to it, let alone read it. http://www.google.com/support/forum/p/blogger/thread?tid=239869e385664e6b&hl=en&fid=239869e385664e6b0004acf9193ad5a4

Tuesday, August 30, 2011

Kintera.org/Blackbaud.com infecting its users - on its donation page

I recently tried to donate money to a friend's charity. The page is hosted on Kintera.org, which includes a form to collect credit card info, and a Java applet that shows who else has donated recently. It uses a scrolling library they probably pulled off some untrustworthy website (I doubt it's the worse possibility - Kintera willfully infecting those making donations).

Unfortunately that scrolling library has 3 viruses, all of which act as Trojans to infect the user's machine and place them at the whim of a command and control bot network:

Java CVE-2008-5353.KM
Java CVE-2009-3867.GC
Java CVE-2008-3869.M

That's pretty embarrassing. The scroll page actually shows one page before you fill out your credit card info, so in the absolute worst case scenario, you view the page, click Continue while the infection is occurring, a keylogger downloads and runs, you enter your credit card info, and off it goes to as many as 3 bot network owners/users. Not cool.


Confidence indeed.

Monday, August 29, 2011

How to Root the HTC Evo Shift 4G

Sprint blocks their forums from viewing by non-logged-in users; this same information is posted at:
http://community.sprint.com/baw/message/329584

But you probably can't view it. Here it is reposted: How to root the HTC Evo Shift 4G.

You need the JDK installed:
http://www.oracle.com/technetwork/java/javase/downloads/java-se-jdk-7-download-432154.html

The Android SDK installed:
http://developer.android.com/sdk/index.html

And the HTC Sync software installed:
http://www.htc.com/www/help/ (scroll down to HTC Sync for all HTC Android phones and click Download)

Now follow these instructions:
http://forum.xda-developers.com/showthread.php?t=1185243

You'll need to cd into the directory where the Android SDK was installed, and then into the platform-tools directory inside that, in order to run adb and perform the other commands they ask you to run. You also need to move the 3 files they tell you to download into platform-tools (or, reference the path you downloaded them to in the commands you run - adb push).

This works on the current version as of this posting Aug 27, 2011: Android 2.3.3, but is unlikely to work in a future OTA update if there is one. Note that this only gives you temporary root but that's all you need to wipe out built-in apps you don't want. Note also that other temp root solutions like Visionary and permanent root solutions like ShiftRR will not work. Only the method linked to above will work on this latest OTA.

You can easily delete built-in apps while rooted by installing ES File Explorer from the Market (it's free), then go into Menu>Settings and check Root Explorer, then check Mount File System. Then browse to /system/app (you may need to change Home Directory to / instead of /sdcard to get to it). Press and hold on built-in apps you don't want, then tap Delete.

I deleted Amazon MP3, Nascar, NFL ("sfl-prod-release.apk"), Sprint Navigator, Sprint TV, and Swype (so I could install the latest). I doubt it's smart to get rid of the annoying Sprint Zone app because it appears to be how PRL updates etc get onto the phone.

You can prevent future OTA updates from putting all these apps back on by tapping Menu>Settings>Software Updates>HTC software update and uncheck Scheduled check. You can always explicitly ask for an OTA update if you want by coming back to this screen and tapping Check now.

Thursday, August 11, 2011

Stop Enforcement of Patents Without a Publicly Available Product


http://mobileopportunity.blogspot.com/2011/08/case-for-software-patents.html

He takes a long time to get to it, but I 100% agree:

restrict the right of "non-practicing entities" (patent trolls) to sue for patent infringement.

That's exactly what we need. Unfortunately he spends most of his time rehashing an old debate, briefly mentions this with no ideas on how to implement it (a tough problem), and moves on.

I think you could lay down some pretty simple rules. First, you could state that a patent cannot be enforced in court if what it protects is not available to the public either through your company or through a company that has licensed it. What this would lead to is a big company potentially stealing your idea while you develop it - but you can always finish the race to get it to market THEN sue for past damages. I think this is an acceptable outcome. It would prevent patent trolls from suing because they obviously have no intention of introducing a competing product, and the cost of doing so would be too high.

It would leave the licensing option open to some abuse though, and the definition of "available to the public" needs a tighter definition as well. But hey - it's a start. More than this guy tried.

He also leaves out one last negative impact of patents: They completely disclose to the world the details of what makes your product special. They protect you from the country against competition (and even then, probably only from small players in the country - big companies have a long history of kicking over the little guy, patents and all). I question whether the value of patents remains for small innovators (which should be the goal) when they have to fully disclose what they're patenting. It seems like you should be able to file a patent, get approved, but not have it go public until you give a say-so (basically when the product is released). There's no point in having the patent anyway until then (because you can't sue until it's available to the public), and making it known beforehand is dangerous - Chinese manufacturers love to just steal designs wholesale and give US companies the finger.

That's the final piece that's missing - worldwide protection after disclosure. That's really an enforcement problem. I suppose that's up to the PTO and the US as a whole to enforce - but only after we get our own **** together.

Friday, July 22, 2011

JQuery Utility Functions - grep()

JQuery's documentation can be a little light sometimes. Today we're looking at an underused core JQuery  function: $.grep()

First, the weird name. If you're not a Linux nerd, you should know that grep is a command in Linux that essentially searches, or more specifically, filters, stuff you pass to it. I would've preferred they call this command .filter() or .where(), but so be it.

As you've likely inferred, grep filters an array or collection of tags. It's like the WHERE clause from SQL or .Where(x => x...) in .NET's LINQ.

I'm going to send you straight to the fiddle for the code:

$.grep() example over an array

And of course, you can use it over a collection of tags - but the syntax is strange. For example, if you want to get a collection of input tags on the page where the user typed at least 10 characters:

var inputs = $.grep($('input[type=text]'), function(input) {
    return input.value.length > 9;
});

What's odd about this is it breaks convention with the rest of the JQuery collection methods. For example, you can set the text color to red for all tags selected with:

$('div').css('color', 'red');

So you would expect this to work:

$('div').grep(...

But it does not, even in the most recent version (as of this posting, 1.6.2). Instead you have to use .filter().

So now with the previous blog post on $.map(), we can get a little fancy. Suppose we've got a bunch of address objects from the server of the format:

var addresses = [
  {
    name: 'Brass Nine Design',
    address1: '321 Sesame St',
    city: 'Sesame',
    state: 'MA'
  }, ...
];

Suppose the user is going to enter a string, and you need to search names and addresses against it, then show the user the companies that match - listed by name. Remember that map is essentially SQL's select, and grep is essentially SQL's where. First you filter the list with grep, then extract the fields you wanted with map:

var matchingNames = $.map($.grep(addresses, function(addr) {
  return addr.name.indexOf(phrase) >= 0 || addr.address1.indexOf(phrase) >= 0;
}), function(addr) {
  return addr.name;
});

Voila.

Thursday, July 21, 2011

JQuery Utility Functions - $.map()

A lot of devs use JQuery nowadays without realizing how many little utilities are sitting inside of it. You can save yourself some coding time, and a fair number of bytes, if you understand what's underneath the hood. You might even get better performance given the optimizations that have been made to these core functions. Top of mind are:

.data()
.grep()
.map()
http://api.jquery.com/category/utilities/

You might begin by asking, "Why don't I just get rid of these? I don't use them." Unfortunately they're core to how JQuery works, so if you use a little bit of it, these are coming along. May as well learn them.

Let's do .map() today.

.map() is basically a translation function for collections. You pass in an array and a function, and the function is used to translate the array into another array. If you've used .NET's LINQ before, .map() is the .Select(x => x...) of Javascript.

As a simple example, suppose you have an array of objects you received from the server:

var people = [
  { name: 'Joe', city: 'Amherst', ... },
  { name: 'Sarah', city: ... },
  ...
];

Now suppose you wanted to just get a list of names in the data. The obvious way is to loop through it, either with for(var i = 0...), or if you want to get fancy, $.each(), but either way it's about the same number of bytes for the user. Here's the $.each() code:

var names = [];
$.each(people, function() {
  names.push(this.name);
});

And of course you'll end up with something that looks like
var names = ['Joe', 'Sarah', ...];

Let's see it in .map()!

var names = $.map(people, function(person) {
  return person.name;
});

And you end up with the same result. When this gets minified, it's easily the byte-winner. It's also likely that browsers will continue to offer more fast aggregation built-in functions - as they do, .map() is likely to adopt them, and you get performance gains for free.

Here's a JSFiddle with the 3 approaches, if you want to experiment:
http://jsfiddle.net/XdWVx/

In a sense, $.map() is $.each() for when you know you're building a new array from the thing you're looping through. As you might guess, you can use this on collections of tags. So for example if you wanted to rapidly gather the elements of a form into an array of values:

var values = $('input[type=text]').map(function(input) {
  return input.value;
});

It's important to notice a troublesome jQuery inconsistency here. $.each() iterating over a list calls back to a function that takes 2 arguments, the index of the array and the value at that index. $.map() calls back to a function with just the one argument, and no index - those who value consistency may writhe at this.

This inconsistency worsens when you loop over an object. In this case, $.map's callback now takes 2 arguments, the property name and its value - but they're reversed from their order used in $.each():

http://jsfiddle.net/b9chris/MGvp3/

Since a lot of legacy code likely depends on this lack of consistency in jQuery, the only clean answer is likely a small library that sits on top of it that rights the ship and cleans up the mess. Something tells me I'm not the first OCD developer to notice this - perhaps it's already out there. Links to libraries on github or other places that solve this problem are welcome in the comments.

Happy mapping.