Author: vatsal

Design Patterns #2 – Observer Pattern

In the last design patterns post we looked at the strategy pattern and how it is a good idea to decouple your constantly changing code from the rest of your application. In this post we will look at another amazing design pattern called the Observer Pattern.

According to, the Observer Pattern “Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.”. This pattern is can be understood by looking at a real-life analogy.

Consider a newspaper or a magazine subscription. The publisher starts a business and begins to publish the newspapers and magazines. You, your friends and family subscribe to a particular publisher, and every time there is a new edition, it gets delivered to you and all the other subscribers. As long as you are subscribed, you get all the new and exciting news and magazines. You unsubscribe anytime you feel like you don’t want the newspapers anymore and the delivery stops. So while the publisher remains in business, people, businesses, hotels, airlines, etc. constantly subscribe and unsubscribe to the newspapers and magazines. This is in essence the entire Observer Pattern.

This pattern forms an important part of the Model-View-Controller architecture. You can read more about the MVC on wikipedia and here. The main idea is that there is a central model and multiple views, which are separated, and the changes made to the model must be reflected automatically in each of the views.

Now let us look at how we will implement this pattern. Imagine you have an object that is the source of the news or new information. This object is called the Subject in the technical parlance. The listeners or subscribers to the Subject are called the Observers (no surprises here!). Suppose we want to develop an application that uses an external API to read the stock prices and then updates some widgets on our system. The widgets might display the current stock price, the beta value and the volume. We will have a Subject interface which will provide methods for registering observers, removing observers and notifying observers of the changes. Our concrete Subject class will extend these interfaces and implement these methods apart from having some custom methods to get the data from the API. We will have an Observer interface that will specify an update() method and some other methods for displaying the data. Now all our widgets will be a concrete implementation of this Observer interface and implement both the update() and display() methods in their own ways. But the key point is that as soon as the Subject gets an update from the API that there has been a change in the value, it will call the method to notify the observers (notifyObservers()). This method can call the update() method for each observer in its observer list and pass the new data to it. This way all the observers will simultaneously receive the updated data.

The diagram below should make it clearer.


Java provides you with an Observable class, which simplifies a lot of it for you. The main difference being that every time there is a change, the concrete subject (StockData class) will first call the setChanged() and then the notifyObservers() method and pass the new value as its argument. The update() method in the observers will get this new value instantly. Be careful though, the Observable in Java is a class and not an interface. This can lead to some issues like you cannot add the observable behavior to an existing class that already extends another superclass. Also you cannot create your own implementation that plays well with the Java’s built in Observer API. You also cannot call the setChanged() method as it is a protected method which means you must first subclass Observable.

Use your best judgment to best determine which way to implement this pattern for your applications. I hope you learned something new and interesting in this post.

Stay tuned for more 🙂

Design Patterns #1 – Strategy Pattern

In the past two years, as I delved deeper into the world of software development and maintenance, I realized that there is a thin line that separates a maintainable code and a messy one. This thin line can save you hundreds of hours in new releases, development and maintenance. This thin line is between those who are familiar with and follow the design patterns and those who don’t. Now honestly speaking, I am ashamed that this is not one of those things that are taught in the undergraduate curriculum for Computer Science. It definitely deserves a mention after students are familiar with the concepts of Object Oriented Programming (OOP). This is because in real world, more time is spent in maintaining and changing software than on initial development. This is why there should be a significant effort towards code reuse and extensibility as well as maintainability.

Design Patterns help developers create functional, elegant, reusable and flexible software[Head First Design Patterns]. Patterns help you in getting to the final product faster by avoiding a lot of common issues that other developers might have faced by providing general solutions to those common problems. Patterns help you in using your basic OOP knowledge and take it one level up to build good systems.

According to, “In software engineering, a design pattern is a general repeatable solution to a commonly occurring problem in software design. A design pattern isn’t a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations.”

Now let us look at our first design patterns – Strategy Pattern.

According to Head First Design Patterns, “The strategy pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable. Strategy lets the algorithm vary independently from clients that use it.”

Now lets translate this definition into simpler terms. What strategy pattern advises is to separate parts of the code that change from those that stay the same. It advises to put different behaviors in different interfaces and then implement each separate behavior through its own class. Let us look at an example now.

Suppose you are developing a game like Counter Strike. Now each character of the game can make use of one weapon at a time, but can change weapons at any time during the game. There are also multiple types of players with different outfits. One way to implement this could be to simply have a Soldier class and all the characters inherit from this. Now if tomorrow, you want to add a dummy soldier who does not have the capabilities to shoot or run,  then you will have to rewrite a lot of classes. Also if every six months you want to add new characters with different capabilities, then inheritance is not a very good way to go.

One way this problem can be solved is by separating the characters and their behaviors. You can have a Character class and all your soldiers can inherit from this. You can have a WeaponBehavior interface that specifies how to use the weapon with a useWeapon() method. Now your Character class can have an instance variable that is declared as the interface WeaponBehavior type. Each different weapon like a gun, a knife, a grenade, a smoke bomb, a rifle, etc. can implement the useWeapon() method of the WeaponBehavior interface in its own unique way. In this way, if tomorrow you want to add a new weapon like say a rocket launcher, you can simply declare a new RocketLauncher class that implements the useWeapon() method of the WeaponBehavior interface to launch a rocket. If you would like to add a new character say a team of elite jokers who terrorize everyone, you can simply add a class for their weapon and a class for their character and you are all done. You don’t have any need to touch any other piece of existing code. Also, now all your existing characters can use these new weapons and your new jokers can also use the previously existing weapons.


So the main aim is to separate the algorithms or varying behaviors from the clients that use it. The image above should make it clearer. You can also refer to this video by Derek Banas for more information.

I hope you understood the need for the knowledge of design patterns and the way the strategy pattern can be utilized. Have a great day developing code and always remember that more time is spent in maintaining and changing software than on initial development. So make it easier for your next incarnation to maintain and extend the code you write in this life.

Stay tuned for more design patterns 🙂

Stochastic Gradient Descent – for beginners

Warning: This article contains only one mathematical equation which can be understood even if you have only passed high school. No other mathematical formulas are present. Reader discretion is advised.

If you have ever taken any Machine Learning course or even tried to read a bit about regression, it is inevitable that you will come across a term called Gradient Descent. The name has all the logic behind the algorithm, descend down a slope. Gradient Descent is a way to minimize any function by determining the slope of the function and then taking a small step in the opposite direction of the slope or going a step downhill. As we go through multiple iterations, we reach a valley.

The equation for the algorithm is:

θ = θ – η. ∇J(θ)                                                                              equation (1)

The ∇J(θ) finds the partial derivative or slope of the function J(θ) and then we multiply it with a learning rate parameter, η that determines how big a step we are going to take. We then adjust our parameter θ in the opposite direction of this.


The image above should make it clearer.

Now this gradient calculation and update is a resource intensive step. By some estimates, if an objective function takes n steps to compute, its gradient takes 3steps. We also have lots of data and our gradient descent has to go over it lots of time. This step has to be repeated for all the θs and all the rows of the data-set. All this requires a huge amount of computing power.

But we can cheat. Instead of computing the exact objective or loss function, we will compute an estimate of it,  a very bad estimate. We will compute the loss for some random sample of the training data, and then compute the gradient only for that sample and pretend that the derivative is the right direction to go.

So now, each step is a very small step, but the price we pay is a higher number of steps instead of one larger step to reach the minima.

However, computationally, we win by a huge margin overall. This technique of using sampling for gradient update is called Stochastic Gradient Descent. It scales well with both the data and the model size which is great since we want both big data and big model.

SGD is however a pretty bad optimizer and comes with a lot of issues in practice. I would suggest Sebastian Ruder’s blog  for more detailed explanations, variations and implementations.

Some tips to help Stochastic Gradient Descent: normalize inputs to zero mean and equal variances; use random weights with zero mean and equal variances as starting points.


Convert ‘csv’ format files to ‘libsvm’ data format

A few days ago I started doing some predictive analytic using Apache Spark’s MLlib. The MLlib is a machine learning library and provides support for a large number of popular machine learning algorithms in Scala, Python and Java.

However, as is the case while running many ML programs, the input data format has to be different for different cases. I wanted to do a classification of data into different categories and I decided to use the MLlib’s Multilayer Perceptron Classifier which is a classifier based on feedforward artificial neural network. You can read more about it here.

The input data format to run analysis using this algorithm required data to be in ‘libsvm’ format. The format looks something like this:

5 9:0.0127418 10:0.06200549 11:1 12:1 13:0.02847017 14:0.05982561
3 3:0.001177284 4:0.01679315 7:1 8:1 9:0.0233416 10:0.08687227 11:0.007628717 12:0.01832714 13:0.003491035 14:0.01856935
2 1:0.01250612 2:0.05098133 5:1 6:1 9:0.01482266 10:0.01268549 11:0.0142893 12:0.02920057 13:0.1376151 14:0.183461
5 5:0.001757722 6:0.01785289 7:0.002907001 8:0.01801159 9:0.01303587 10:0.07466476 11:1 12:1 13:0.02893818 14:0.0608585

The values are in the following format:

label col1:val1 col2:val2 ………. colN:valN

The label is simply the class or category of the value and the tab separated values are the non-zero values in the various columns of the data-set. So for the example data-set, we have the category label for first record as ‘5’ and the columns 9,10,11,12,13 and 14 have non-zero values in them which are given after the colon (:).

We basically want to use a compressed row storage (CRS) format which puts the subsequent non-zeros of the matrix rows in contiguous memory locations.

Now for my case, I had a comma separated values (*.csv) file in the following format:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
0, 0, 0, 0, 0, 0, 0, 0, 0.012741798, 0.06200549, 1, 1, 0.028470168, 0.05982561, 5
0, 0, 0.001177284, 0.016793154, 0, 0, 1, 1, 0.023341597, 0.086872275, 0.007628717, 0.018327144, 0.003491035, 0.018569352, 3
0.01250612, 0.050981331, 0, 0, 1, 1, 0, 0, 0.014822657, 0.012685495, 0.014289297, 0.029200574, 0.137615081, 0.183460986, 2
0, 0, 0, 0, 0.0017572, 0.017852892, 0.002907001, 0.018011585, 0.013035873, 0.074664762, 1, 1, 0.028938184, 0.060858526, 5

The first line has the column headers. The last value for each line or the 15th value is the class label. The simplest way to convert this csv file to a libsvm format is to user two R packages – e1071 and SparseM.

The following code takes the csv file as input and converts it into a txt file in libsvm format:

# download the e1071 library

# download the SparseM library

# load the libraries

# load the csv dataset into memory
train <- read.csv('inputFilePath.csv')

# convert labels into numeric format
train$X15 <- as.numeric(train$X15)

# convert from data.frame to matrix format
x <- as.matrix(train[,1:14])

# put the labels in a separate vector
y <- train[,15]

# convert to compressed sparse row format
xs <- as.matrix.csr(x)

# write the output libsvm format file 
write.matrix.csr(xs, y=y, file="out.txt")

With these 10 lines we now have text file in the desired libsvm format ready to be loaded into Spark for further number crunching.

Hope it was helpful.

Stay tuned 🙂

Found a few amazing blogs for R and Data Science enthusiasts

Today I had a bit of free time and since I had not opened up my R console in a really long long time, I decided to try a few of the scripts that could do something interesting. Going through my R-bloggers mails, I found quite a few interesting posts. I thought of putting these here so that I don’t lose them. Hope you enjoy it too. 🙂

Wordclouds with R! – as simple as it can get

Recently I started with a wonderful course titled “MITx-15.071X – The Analytics Edge” on edX. In my experience it is the best course for getting a quick hands on experience with the real world data science applications. If you have already done the course on Machine Learning by Stanford on Coursera, then I would say that its a great follow up course to learn and apply the algorithms on R by doing this course.

Now coming to the main point at hand – Wordcloud. Visualizations are a great way to present information in layman’s term to people who might not be too scientifically or mathematically oriented. Imagine you have to find the most important words in a text and present them. You could create a table of it, but it would be too dull and might not be too appealing to everyone. Wordclouds are a great way to overcome this issue. R provides an extremely simple way to create wordclouds with just 10 lines of code. So lets dive into it.

Step 1: Save your text in a simple notepad text file. For this post I will use an excerpt from the Military-Industrial Complex Speech by Dwight D. Eisenhower, in 1961, which can be found here:

Save the text in a simple .txt file and add an empty line at the end. The reason for this will become clear in the next step.

Step 2: Open the file in R using the command

speech = readLines(“Eisenhower.txt”)

If you had not added an empty line there would be a warning message saying that

incomplete final line found on 'Eisnehower.txt'

This is because readLines() requires an empty line at the end of the file to detect the end.

Step 3: Now we need to download and install 3 packages in R.




Then load these packages using:

library(tm) … and so on

Step 4: This is one of the most important steps in the process. We will use the text-mining package that we just loaded and use it to modify and clean out our text.

First we convert our text to a specific class of R which provides infrastructure for natural language text called Corpus.

eisen = Corpus(VectorSource(speech))

Then we remove all the whitespaces from the text.

eisen = tm_map(eisen, stripWhitespace)

Next we convert all the letters to their lowercase and remove all punctuations.

eisen = tm_map(eisen, tolower)

eisen = tm_map(eisen, removePunctuation)

A speech will contain many typical english words like “I”, “me”, “my”, “and”, “to”, etc. We don’t want these to clutter our cloud and so we must remove them. Fortunately for us R has a list of some typical english words that can be accessed using stopwords(“english”). We will use this directly.

eisen = tm_map(eisen, removeWords, stopwords(“english”))

Looking at the speech I decided to remove three more words using

eisen = tm_map(eisen, removeWords, c(“must”,”will”,”also”))

Next we convert our eisen variable into a plain test format which is necessary in the newer versions of the tm package.

eisen = tm_map(eisen, PlainTextDocument)

Now we will convert this to a nice table like format which will help us get all the words and their frequencies.

dtmEisen = DocumentTermMatrix(eisen)

eisenFinal =

You can see the count of various words in the table by using the colnames() and colSums() functions.

table(colnames(eisenFinal), colSums(eisenFinal))

Here the words are given in rows and their counts in the columns.

Now lets us plot this using a simple wordcloud.

wordcloud(colnames(eisenFinal), colSums(eisenFinal))

You will get a very basic wordcloud as such:


We can use the other parameters of the wordcloud function by looking at the doucumentation.


Lets use them

wordcloud(colnames(eisenFinal), colSums(eisenFinal),scale=c(4,.5),min.freq=1,max.words=Inf, random.order=FALSE, random.color=FALSE, rot.per=.5, colors=brewer.pal(12, "Paired"), ordered.colors=FALSE, fixed.asp=TRUE)

To find out what each of these parameters do, please refer to its documentation. Its extremely simple.

Our new plot looks something like this:


You can also type


to view the different color combinations to give to “colors” parameter and experiment with various combinations.

Well there you go. You can now create and publish exciting wordclouds within seconds using R.

Have fun!!!

Add Horizontal Scroll Bar for IDLE

Since the last week, I have been spending a lot of time scripting in Python, and one of the most difficult things that I found was going through the long lines of code that would extend out of my screen width. I realized that the absence of a horizontal bar was a big problem. Luckily I found a solution online for adding the Horizontal Scroll bar in IDLE by modifying the file located in the “….\Python34\Lib\idlelib” directory (check the directory where Python was installed).

To make the changes in IDLE, open and perform a search for ‘vbar’ which is in the EditorWindow class, __init__ method.
Add those lines that have ### appended to them and then restart the IDLE.

self.vbar = vbar = Scrollbar(top, name=’vbar’)
self.hbar = hbar = Scrollbar(top, orient=HORIZONTAL, name=’hbar’)   ###

vbar[‘command’] = text.yview
vbar.pack(side=RIGHT, fill=Y)
hbar[‘command’] = text.xview ###
hbar.pack(side=BOTTOM, fill=X) ###

text[‘yscrollcommand’] = vbar.set
text[‘xscrollcommand’] = hbar.set ###