Tuesday, February 12, 2013

getting the current time in hadoop-hive

As I keep forgetting how to do this:

from_unixtime(unix_timestamp())
will give you the current time in a query


to get year:weeknumber:
select concat(year(from_unixtime(unix_timestamp())), ":", weekofyear(from_unixtime(unix_timestamp())) from <tablename>

Friday, December 7, 2012

python debugs


3 categories of bugs:
1. actual bugs - logic and more: I did not visually see the schema change - so ran into run time issues
2. code bugs : forgot to test all code paths.
3. Typo: possibly as I worked remotely on this. Existing code stopped working as I accedently removed a "," in between 2 strings

Monday, June 18, 2012

pain of line endings - remove the dos line endings

The pain of \r\n. If your scripts in *nix are giving strange errors, dos line endings might be the issue. The issues of mixing Windows and Linux.


# remove dos char endings from a file
for i in $* 
do
  sed 's/\r//' $i > $i.out && mv $i.out $i
done


Save as script.sh, and run on your troublesome files.
script.sh <files>
e.g. 
script.sh *.sql


Thursday, November 10, 2011

Hello Mongo + Python

Sample to use Python + mongo
1. install the pre-requisites (on macs/linux)
$easy_install pymongo

2. start python
$python
>>> import pymongo
>>> connection = pymongo.Connection("localhost")
>>> db=connection.test
>>> print db.mycoll.count()

and you are free to write python code using mongoDB.

Monday, October 31, 2011

Generate data in mongo

I love mongodb!


This is the "hello world"" / simplest way I have found to generate sample data in mongod.

1. ensure "mongod" is running
2. open mongo shell (run "mongo")
3. for (i=0; i< 1000;i++){ db.mycoll.insert({"val":i}); }

And what you have is a collection with 1000 docs.

Friday, May 20, 2011

python awesomesness 1

I am sure many folks go through this phase when writing python code after other languages.

return dict( zip(x, range(len(x))))

is a line of code I needed.

Converted a sorted list to a dictionary of list items and the index of the item.

Thursday, January 20, 2011

aws

If
ec2_describe_* returns nothing/blank, then the default region pointed to might be incorrect.
In my case,
ec2_describe_instances
ec2_describe_snapshots
etc, all returned blank
Setting
EC2_URL=https://us-west-1.ec2.amazonaws.com
in your shell fixes it.