Saturday, June 16, 2012

Submitting a HPC job from on-premise - Part 2 of 2


Continuing with my previous article, today I am going to share my one more experience with HPC.  Nowadays, there are lots of links available on net, which tell, how to submit a job to Azure HPC Scheduler. I also explored most of them, and found that almost all are talking about job submission from cloud premise(Front or Head node). But nowhere was it mentioned how to submit a job from outside of cloud vicinity. I came across this issue, while working on one of my assignments.

After hitting my head, finally I found a way to do this. When we are talking about interaction between cloud and on-premise, then first thing come into our mind is network. Here definitely, TCP won’t work. There are few transport schemes available in Azure HPC framework. After exploring them, I found one as WebAPI and wow, that suits my requirement. I just used WebAPI as transport scheme, by modifying the name of the headnode to complete host name (i.e. <headNodeName>.cloudapp.net) and get my issue resolved. Sample code is as:

Note:
If if everything(service name, head node, etc...) is configured properly, even then you are getting exception like, ‘Fail to get the cluster name’. Then just redeploy the HPC scheduler package and try it again. I know, this is painful but this is the only solution, I came across. And believe me, this was just clicked for me.

Hope it will be helpful, for those who get stuck with similar kind of issue.

Saturday, June 2, 2012

My first day on Microsoft Windows Azure HPC - Part 1 of 2


Recently I entered into the Azure’s HPC world and thought to share something, which I learnt from my experience as a beginner. Today is my first step towards the Azure’s HPC.  I just downloaded the sample service and followed the configuration instructions. But even after following those, I faced few difficulties during this journey. So, here I am sharing that experience only. Hope it will be useful for all newbies.

I feel that below are the few points, which are directly related to the application performance.

1)      Affinity Group – While working on cloud, the first thing comes into mind is performance and throughput. And in order to gain better throughput and performance, one need to make sure that all our services and storage accounts hosted on the cloud are located in proximity, which will reduce data transfer time. So, to bring all the services and storage closer, we can keep all of them in a group. And in Azure’s term, this group is called as Affinity Group. So, creating an affinity group will increase data transfer speed till very high extent. So, it is always a good practice to create our own affinity group as a first and foremost step.

2)      Storage Account and Affinity Group – Whenever HPC is configured using sample service, one storage account is being created automatically.  This storage account is created anywhere, irrespective of the affinity group created. One important point to note here is to; check the affinity group of this newly created storage account. If it is in the same affinity group which we created for our app, then great, we need not to do anything :) .

But in case, if it is not lying in the same group, then we have to bring that storage account into our already created affinity group. As of today, there is no direct way to bring the storage account into our desired affinity group or let’s say, there is no such way for storage account to switch between affinity groups. The only solution available is, delete the automatically created storage account (newly created one) and create new storage account. While creating new storage account from scratch, we can easily assign the desired affinity group and at the same time, one can skip the statements in the Azure Sample Service code where it tries to create the storage account.

3)      Location – While configuring HPC scheduler, the following steps are accomplished:
  •  Selection of location (i.e. South East Asia)
  •  HPC scheduler randomly selects one of the configured database server located in the above selected location (here it is South East Asia)
  •  Once the database server is selected, next step is to supply credentials to access/create the database
  •    Now the issue here is, how to get user credentials for this randomly selected database server.

To overcome this issue, as of today, the only solution possible is to modify the source code in the sample service and mentioned the name of the desired database server for which user credentials are known.


More info - Whenever someone talks about HPC scheduler, the first thing which hits our mind is Head Node and the Compute nodes. Head node plays a vital role, while talking about Windows Azure HPC Scheduler. One can learn more about HPC Scheduler at http://msdn.microsoft.com/en-us/library/windowsazure/hh545593.aspx .The main purpose of Head node is to distribute jobs/tasks among all available Compute nodes. One can download the sample (Windows Azure HPC Scheduler code sample) from Azure website http://code.msdn.microsoft.com/windowsazure/Windows-Azure-HPC-7d75eb26  and follow the steps to configure HPC Scheduler.


Sunday, May 27, 2012

Split multi page tiff file - (C# code attached)

While working with image files, one of the biggest constraint is the size of file. When file is too big, then it need too much time to process add load. So, to resolve this issue, we can split one big image (tiff) file into various pages. This code sample will explain how to work with TIFF (Tagged Image File Format) using c#.net. It will cover splitting  a multipage tiff file into multiple tiff files and reading the properties of tiff file using c#. TIFF files are one of the format in which images can be saved. To split multipage tiff file, mainly 3 steps are required as 1) Get the total number of pages in a TIFF file 2) Get encoder information for the TIFF type file 3) Save each page of multipage  TIFF file into seperate TIFF files. In TIFF, there are predefined types, which tell the value of an item is of which type...


For more details, please visit my post titled 'Split multi page tiff file' at MSDN.

Sunday, May 20, 2012

Performance analysis for String and StringBuilder


Sometimes small-small changes in our code really makes a huge difference to a performance. There are many tips and tricks available and among those, one I am going to discuss over here. I'll be talking about String vs StringBuilder. One needs to be very careful while playing with strings because memory wise there is a huge impact of strings. I know, there are lots and lots of articles available on net on String and StringBuilder, but still I am going to show this, using some statistics.

Here I am taking Fx 4.0 C# console application with different static methods to showcase my analysis. Basically what I am doing here is, I am having a String variable named outputString and just looping that for 1000 times and concating the string to variable outputString. 


Please note, concatenation is done using + symbol. So, what happens internally is, whenever concatenation is done using + symbol, every time, new String object is created. So, as with my snippet. Here I am looping 1000 times, so, it is creating 1000 String objects and every time is is replaced with variable outputString . That way, whenever we use string concatenation with the plus (+) sign, it is definitely going to cost our application performance.

Well, I guess this much boring theory is enough. Let's move towards statistics. 

Here I am using CLR Profiler and is really one of the good tool to analyse our code performance. This tool tells us, how much memory bytes are consumed, Garbage Collector  performance and how many objects it is moving to generation Gen0, Gen1 and Gen2 buckets. And at the same time statistics provided by this tool is very easy to understand.

Ok, I just ran CLR Profiler for the above code and got the below statistics. Here I am not going to cover GC generations in detail, but would like to touch bit on it. One must know that all the objects created in application, first comes to G0 bucket and then older objects are moved to G1 bucket. If the G1 bucket is going to full then older objects get moved to G2 bucket.  But for .Net GC, frequency of visiting G1 and G2 is very less, compare to the G0 bucket. It means that GC is visiting bucket 0 frequently, so it is releasing G0 objects much frequently and the scope of object is also very less. So, if your application is creating objects which lot many objects are moving to G1 and G2, then it is not a good sign. 

Now quickly jumping back to our example:

Here we see that heap bytes are present in all three Gen 0,Gen 1,Gen 2 and even the memory wise also it is 7 digit (2,894,353).
 Here Relocated bytes means it is going to be the part of G1 related objects. Here I am not going to analyse all the result, but somehow we are seeing here some negative signs because few of the objects are falling in G1 and G2 buckets also.

Now before commenting on it, lets take StringBuilder's data. In this example, I just created a StringBuilder instance named sb. Here I am doing the same thing, but instaed of string, I am taking instance of StringBuilder. In case of StringBuilder, whenever value will be appended, it will not create any new object but just updates the reference of the sb object with the new value. So, internally it is not creating a new object for every concatenation. So, this is the real benefit of StringBuilder as compare to String object.

Although we are looping for 1000 times, but it doesn't mean that we are creating 1000 string objects. That's the way we are controlling memory usage and creation of new objects. Now will run profiler and checkout the results.



Here we see that memory bytes are reduced to 5 digits (92, 332) and relocated bytes are nothing. If we will see that Heap bytes, it is unknown (0) for all G0, G1 and G2. It means, none of the objects are moved to G1 and G2. All the objects are created in G0 and release from G0 itself.



So, here we noticed that there is a significant difference in both memory usage as well as GC's bucket movements.

Hence we can conclude that we should prefer to use StringBuilder, rather than String specially when  we are dealing with concatenations. 

Friday, May 18, 2012

BackgroundWorker in .Net Console Application


Today I was just doing net surf and came across one interesting question 'Can progress event of BackgroundWorker execute after completed event'. At first I thought no, but when I tried this with Console application, I was also able to reproduce this issue. Now question is, how come this scenario occurs in Console app and not in Windows form. Pretty interesting, right ?


Now coming to Windows form, this issue will never occur, due to message queuing support. Windows message queue takes very good care of execution sequence of the events. This clearly mean that the progress event may run after the DoWork has completed, but the completion event will always happen afterwards. Another interesting thing here is the SynchronizationContext, which helps in maintaining all these sequencing.


But when talking about Console application, none of the above holds true. There is no SynchronizationContext installed and the events just end up in getting run in threadpool thread, which doesn't guarantee any order.


Test case: I created a console app and used Backgroundworker with all the required event handlers. In progress event handler, I added below lines:
Console.WriteLine("One");
Console.WriteLine("Two");
Console.WriteLine("Three");
Console.WriteLine("Four");


On executing the console application, I found that output messages are not in the order, which I mentioned in code. On standard output I received Two, Three, Four, One and sometimes I received One, Two, Three, Four and sometime, I also found one of the message missing and in output I got only Two, Three, Four. But in Windows Form, I always get the output in correct order as One, Two, Three, Four.


I hope, above analogy makes sense.

Resource name can not be used more than once

Recently I came across an error "Resource name can not be used more then once". Apart from this, error message was not showing any other information, not even line number, file name, nothing. Generally such errors came, when there is any duplicate key present in resource file, but in my case, I was not using any resource file also. So, there is no chance of duplicate keys also. I tried to hit my head many times for some online help, but no luck :(


One thing I noticed was, after building my solution (it was in VS2010) for 3-4 times continuously, error was thrown. Please note, I was just building the solution, without doing any modification in my code or in any of the files. Still I didn't get any clue.


So, finally I thought to remove one one project from my solution and build. Till 4-5 projects I removed and I didn't get any clue till yet. Suddenly I found that, Obj folder is added to my solution explorer. This obj folder holds all temporary files with few .resource files. Then I realize that, entire issue was due to this Obj folder, because whenever we build our solution, Visual Studio tries to create some files, and in my case these files were already part of that Obj folder.


Till now also, I am not sure, how that Obj folder get added to my solution explorer. Probably by mistake, I might have clicked on "include in project", as "Show all files" option was also enabled.


But finally, I was able to figure out the cause and thought to share it here. 
Hope it will help you !!!



Saturday, May 12, 2012

Matching braces in code

In day-to-day life developers use to write huge logic involving many braces ({,}) in the code. Reaching to end/start of any condition gets complex as the lines of code increases. To simplify the same, one can use key combination of Ctrl+].


To use the given key combination, place the cursor on any brace and hit Ctrl+]. If the brace is an end brace, the control will move to the matching brace i.e. start brace of the condition and vice versa.


Also the same key combination can be used to navigate to the matching comment (/*, */) or region (#region). In these cases, the cursor position should be on the comment or the region respectively.


Hope this helps !!!

Wednesday, April 25, 2012

How throw works in .Net


As we all know, Exception handling plays a very important role in any developer’s life. When talking about exception handling, throw is the first thing, which comes into our mind. Today, we will see, how actually throw works.





The given code catches the exception and just throws it again, without passing any explicit Exception object. 







Now, let’s take another version of this above code:




This given code will create the object of Employee and will catch the exception and from catch block it will throw the catched exception via ex (our Exception class object).




Now question is how these two code snippets are different. For more analysis, let’s open ILDasm and drop your .EXE into it. For the first snippet, we will see something like below:









From this given image, we can see ex (Exception class object) has been defined as local variable using .local, but in the catch block, compiler changes the throw statement into rethrow. It means, instead of changing the original stack trace, compiler is just re-throwing the existing one.

Whereas, if we will look at second snippet:










Here also ex is defined as a local variable, but catch block is bit different here as compared to snippet 1.  In the catch block, compiler is loading data from location 1 (ldloc 1), which is ex (Exception object) and throws that one. And as a result, this ex will not hold all the stack trace raised earlier except the stack trace from this current state.

So, it is clear that ex override the stack trace whereas, just throw statement does not override the stack trace.


Monday, April 23, 2012

Finalize in .Net


We implement the Finalize method to release the unmanaged resources. First let’s see, what is managed and unmanaged resources. Managed ones are those, which we write in .Net languages. But when we write code in any non .Net language like VB 6 or any windows API, we call it as unmanaged. And there are very high chances that we use any win API or any COM component in our application. So, as managed resources are not managed by CLR, we need to handle them at our own. So, once we are done with unmanaged resources, we need to clean them. The cleanup and releasing of unmanaged is done in Finalize(). If your class is not using any unmanaged resources, then you can forget about Finalize(). But problem is, we can’t directly call Finalize(), we do not have control of it. Then who is going to call this. Basically GC calls this.
And one more thing to remember is, there is no Finalize keyword that we will write and implement it. We can define Finalize by defining the Destructor. Destructor is use to clean up unmanaged resourced. When u will put ~ sign in front of class name, it will be treated as destructor. So, when code is compiled, the destructor is going to convert that into Finalize and further garbage collector will add it to the Finalize queue. Let’s take this sample code:

class A    {
       public A()
       { Console.WriteLine("I am in A"); }
       ~A()
       { Console.WriteLine("Destructor of A"); }
   }

   class B : A    {
       public B()
       { Console.WriteLine("I am in B"); }
       ~B()
       { Console.WriteLine("Destructor of B"); }
   }

   class C : B   {
       public C()
       { Console.WriteLine("I am in C"); }
       ~C()
       { Console.WriteLine("Destructor of C"); }
   }

Now using Reflector, we will see, if Destructor, really converted to Finalize:










And WOW, it’s really done. Here we can see that there is nothing like destructor. Basically destructor is overriding the Finalize method.

Hope it helps !!!

Saturday, April 21, 2012

Memory Leak Analysis for .Net application


Memory leaks in .Net applications are always proven to be the nightmare for developers. Many times we get “OutOfMemoryException”, which is nothing but due to memory leak only. There are many reasons, which lead to memory leak situation. For example, sometimes we forget to release unmanaged resources, dispose heavy objects (i.e., drawing objects), even holding reference of managed objects, longer than necessary can also lead to memory leaks.

So, if the application is small, one can analyze the code and figure it out, which object is causing memory leak. But when it comes to a large application, it is not at all possible to figure out manually. In that case, we need some tool, which can help us to figure out the area or object, which is causing memory leak. So, today I surf internet and came up with a tool called .Net Memory Profiler, which can do analysis for us and give us the statistics of all the instances.

Ok, instead of getting more into theory, let’s jump quickly to the demo. I have a windows form application named “MemoryLeakAnalysis”. Now I open memory profiler, which comes up with the below screen. Profiler can be run in two different modes as interactive (normal mode with UI shown below) and non-interactive mode (can only be used for automated testing as part of script. It will not show any window).
Click on ‘Profile application’ and select the exe of your application, as shown below. If require, command line argument can also be provided
On click of next, you can decide the profiling level as Very low, Low, Medium, High, etc. Moving further, you can also decide, whether you want to enable unmanaged resource tracker (collects information about handles, GDI handles, etc), and finally click on start. Clicking on start will launch your application (here my application name is Test Leakage)
In the right hand side, you can see various options as Collect snapshot, Stop profiling, Show real-time data. And just below that, we have ‘Investigate memory leaks’. On clicking of ‘Investigate memory leaks’, you will get the list of major steps, which needs to be taken up, in order to analyze leakage.





















Now the actual investigation starts.
1)     Perform initial operation - Perform the operation you suspect is memory leaking (e.g., open a document, work with it and then close it). Performing an initial operation will make sure that instances that are only created once are not included in the memory leak investigation. In my case, I’ll click on ‘Start Memory Leak’ button and after a while, I’ll click on ‘Stop Memory Leak’
2)     Collect base snapshot - The base snapshot will be used as a reference when looking for unexpected new instances that are created by the operation. Once the snapshot is taken, we will come up with below screen, with some statistics.
















3)     Perform operation again - Again we will perform the operation, which we suspect are leaking memory.    Because, this operation will give us new snapshot for comparison. In my application, I will again click on ‘start Memory Leak’ button:









4)     Collect primary snapshot - The primary snapshot will be used when investigating new instances that might be part of memory leak.
5)     Identify the types with New instances - Instances shown under Overview tab (highlighted one), are the one’s, which are not garbage collected.








6)   Identify the types which are not expected to have New instances - For those instances, we will find that value of New column is 0, which clearly states that that object is already collected by GC.
7)    Investigate root path - Root path can be extremely useful for identifying memory leaks. Shortest path provide information about why instances are not garbage collected. You can use browse buttons to locate a root path that you’d like to investigate further












8)     Determine whether root path instance is part of memory leak - Instance graph and Allocation call stack will provide information about how the instance is used, why it has not been garbage collected, and how it was created. This information can be used to determine, whether instance is part of memory leak or not













9)     Steps from 6 to 8 can be used to analyze another types.

So, by looking at the instance graph and red arrows shown above, will help us the identify, where exactly leak is happening.