How can you clean input data?

If you’re going to “diagnose the state of your data”, why not clean it up. There’s so much that you can do at entry time, at retrieval time, and at any time between:


- wash non-printable characters 

- wash illogical (depending upon context) characters 

- trim leading and trailing spaces 

- verify check digits 

- verify lookups 

- verify against standards (USPS, etc.) 

- add Soundex, Metaphone, levenshtein, etc. 

- build a context suitable hash 

- add Soundex, Metaphone, levenshtein, etc. to the hash 

- wash standard keywords