Explode map of map field in Spark Scala

def flattenMap = udf((map: Map[String, Map[String, Float]]) => map.map(x => (x._1, x._2)).toArray)
val explodedDf = df.select(explode(flattenMap(col("column_name")))
explodedDf.printSchema
root
 |-- query: string (nullable = true)
 |-- col: struct (nullable = true)
 |    |-- _1: string (nullable = true)
 |    |-- _2: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: double (valueContainsNull = false)

Git rebase and squash commits

When I am working on a feature branch that take couple of weeks to finish, I don’t want to diverge too much from the upstream. To keep my branch up to date with the upstream, I do a rebase with the upstream once every few days. Following is my workflow.

  1. Fetch upstream
    git fetch upstream
    
  2. Rebase feature branch with upstream
    git rebase upstream/master
    

While working on feature branch, I would have made several small commits into the feature branch. Before creating a pull request to merge the changes into upstream, it is a good practice to squash all the small commits into one commit.  Following is my workflow to squash the commits.

  1. Find the common parent between feature branch and upstream branch
    git merge-base HEAD upstream/master
    
  2. Squash commits using interactive rebase with the commit hash in step 1.
    git rebase -i <commit hash from step 1>
    
  3. When editor is opened to select which commit to keep and which to squash, change all but first commit from pick to squash (or s for shortcut). Save and quit the editor.
  4. Editor is opened again for the squashed commit. Either keep it the messages as it is (not a good practice) OR delete existing commit messages, add a new meaningful commit message. Save and quit the editor.

Embedding Lua engine in C/C++

Lua is a very lightweight language to be embedded as a scripting support language for any system.

Following are the steps to embed Lua engine in C/C++.

Step 1. Include header files.

extern "C" {
#include <lua5.1/lua.h>
#include <lua5.1/lauxlib.h>
#include <lua5.1/lualib.h>
}

Step 2. Initialize Lua engine

lua_State *engine = lua_open();
luaopen_base(engine);
luaopen_string(engine);
luaopen_math(engine);
luaopen_table(engine);
/* above luaopen_* can be replaced with luaL_openlibs(engine) to open all standard libraries */

Step 3. Load Lua script from buffer

char *buf;
// read script into buffer
if (luaL_loadbuffer(engine, buf, strlen(buf) != LUA_OK) {
  fprintf(stderr, "loading script failed with error %s", lua_tostring(engine, -1));
  lua_pop(engine, 1);
  // do not continue
}

Step 4. Execute the loaded buffer

if (lua_pcall(engine
                 , 0  // number of arguments on the stack
                 , 0  // number of expected result parameters
                 , 0) != LUA_OK) {
  fprintf(stderr, "executing script failed with error %s", lua_tostring(engine, -1));
  lua_pop(engine, 1);
  // do not continue
}

More about lua_pcall

Step 5. Register C function to be called within Lua script

// following function can be called from Lua script with 1 parameter
int add_ten(lua_State *engine) {
   double p = lua_tonumber(engine, 1);
   p = p + 10.0;
   lua_pushnumber(engine, p);
   return 1;
}

lua_register(engine, "add_ten", add_ten);

// call add_ten within engine
lua_getglobal(engine, "add_ten");
lua_pushnumber(engine, 50.0);
lua_call(engine, 1, 1); // pass one parameter and get back 1 parameter
double result = lua_tonumber(engine, -1); // fetch from top of stack
printf( "result = %g\n", result ); // this should print 60

Step 6. Destroy Lua engine

lua_close(engine);

Setting up tags for vim

Tags

, , , ,

I have multiple git repositories. Also I use Vim as my editor especially for C and C++ code.

ctags a great tool to generate cross reference tags file. Vim support tags file to jump around between different source code files. Vim require to set tags variable to the path of tags file generated by ctags. Since I work on multiple git repositories, having a static tags file in .vimrc file is not an option for me. So I came up with .vimrc script to set the tags file.


let s:top_level_cmd = "git rev-parse --show-toplevel"
let s:top_level_dir = system(s:top_level_cmd)
if !v:shell_error
let s:tag_file = substitute(system(s:top_level_cmd), "\n", "", "g"). '/tags'
execute 'set tags='. s:tag_file
endif

Human Vs. Machine

Tags

, , ,

2 days back I was watching a movie called “Pasand Apni Apni” (hindi language bollywood movie) with friends. In the middle of the match, one friend asked a question “the sequence in this movie is similar to one of the recent time movie. Which one is that?”. Everyone came up with an answer “Gajni” (yet another hindi language bollywood movie, kind of remake of hollywood movie “momento”). 

One may ask what is so great about it? You guys have watched both movies and came up with the answer. That’s right. There is nothing great about it as long as you are a human. Nobody told us the story of the movie in words, but we derived it by looking at the sequence in the movies. All the actors/actresses in the movie became an abstract entity. We were able to mask the people in the movie, learn from the sequence & the dialogues in the movie and finally come up to a conclusion about the story of the movie. This is so unique about the human ability. 

If you have shown the same movies and several others to a machine (say Watson by IBM) and ask the same question my friend asked to Watson, would it be able to come up with the answer. I guess not (not yet). That is because Watson is programmed to search for the keywords in the question and search the vast database to come up with the answer. Keywords in the questions does not link these 2 movies in anyways. It was just the ability of the humans to analyze the question, not just by keywords, but using the context too.

There is no machine (not even Siri) is programmed to do such thing yet. Only the time will tell the future.

Profiling C++ code using Google Performance Tools

Google performance tool is an excellent tool for profiling C++ code.

Install libuwind

wget http://download.savannah.gnu.org/releases/libunwind/libunwind-1.0.1.tar.gz
tar xvfz libunwind-1.0.1.tar.gz
cd libunwind-1.0.1
export CFLAGS=-U_FORTIFY_SOURCE
./configure
make
sudo make install

Install Google Performance Tools

wget http://google-perftools.googlecode.com/files/google-perftools-1.8.3.tar.gz
tar xvfz google-perftools-1.8.3.tar.gz
cd google-perftools-1.8.3
./configure
make
sudo make install

Install supporting packages

sudo apt-get install graphviz
sudo apt-get install gv

Include profiling code in C++

Include supporting header file(s)

#include <google/profiler.h>

Start Profiler

Assuming that profiler output is stored in file /tmp/output.pprof. The following code can be placed anywhere in the code. This is the point at which the profiler will start collecting data. It is possible to start profiling at multiple stages into different files.

ProfilerStart("/tmp/output.pprof);

Stop Profiler

It is possible to start profiler at any point of time by placing the following code.

ProfilerStop();

Run Profiling

Compile the code and execute it. Once the run is finished or ProfilerStop is called, profiler output will be available in the output file.

Generate profiling report

Several types of reports can be generated from the output file.

Graphical report

X11 need to be enabled for the graphical report.

pprof -gv --lines /tmp/output.pprof