Saturday, 14 October 2017

Concepts on Database Normalization

Normalization -


Normalization is a process to reduce redundancy, ensure data integrity using functional dependency.
Databases which are not normalized frequently creates problem like Insertion anomaly, Deletion anomaly and modification anomaly of data and hence creates inconsistency.

In simple words, a single table should store single idea/information.

To understand normalization, it is important to understand the concept of Functional dependency and keys.


Functional Dependency -


Lets assume there is a relation Alpha -> Beta in a relational table.
If the value of alpha is known, using that value you can search the value of beta. This is functional dependency.

For the same value of Alpha,  if you get different values of Beta, then it is not functional dependency.

Closure set of attributes -
This means what all values can be searched for the given value.


Keys  -

Super key - super keys or keys a set of attributes used to uniquely identify a row among a set of rows.

Candidate key - Minimal super key is candidate key.
steps to find a candidate key
1) If it is a super key
2) If any proper subset is a super key, if yes then it cannot be a candidate key, if no then it is candidate key.

In other words, if proper subset of a super key if not a super key is a candidate key.

Primary key - candidate key chosen to identify a row uniquely is called Primary key.

A simple example. Consider the relation below

a -> bcd
ab -> cd
abc -> d
bd -> ab

super key - a, ab, abc, bd
candidate key - a, bd


There are 4 kinds of Normalized forms.


1 NF -

A table is said to be in 1NF if every column contains only single attribute. 1NF does not allow multi value attribute in any column.

1NF increases data redundancy as the number of rows with same column increases. Each row as a whole is unique.


2NF -


R (ab cd) lets say ab is candidate key
c is dependent on b
d is dependent on ab

prime attribute : if ab is candidate key, both a and b are prime attributes
non-prime attributes : attributes which are not a part of candidate key, c and d are non prime attributes.

c is dependent on a part of candidate key. This is called partial dependency.
Non prime attribute depends on a part of candidate key.

A table is said to be in 2NF , if it is in 1NF and there exists no partial dependency.

Issue with 2NF : primary key cant be null . so ab in together cannot be null, but b can be null with a having a valid value. This creates a problem, you cant find the value of c if b is null and a has a valid value.


3 NF -


Transitive dependency -

consider below relation
R(ab cd)
ab -> c
c-> d
A non prime attribute depends on another non prime attribute. This is transitive dependency.

A table is said to be in 3NF if it is in 2NF and there exists no transitive dependency.
That means there should not be any partial and transitive dependency.


Issue with 3NF : If c becomes null, you cant find the value of d.


BCNF -

BCNF stands for Boyce-codd normal form

when a non prime attribute is derived from a prime/non prime attribute. consider below example

R (abc)
ab -> c
c-> b

for every dependency a -> b
a should be super key, then b can be ignored.

More details are here

Wednesday, 27 September 2017

Basics of Git Version Control

Git is a version control software used to manage objects. These objects can be anything from code in any language, to image, to text files. git stores all this information in 'repository'. Git is free and open source. Git is very powerful and easy to learn and use.

Installation : Git

Basic Commands:
Here are some basic git commands which can be helpful from time to time for a newbie or noob.

git init - To create and initialise repository in the current local path/folder
              After issuing this command, it creates a hidden .git folder to store all the information

git status - Once you create some objects in the local repository, this status command can help to track the changes in object.

git add <ObjectName> - This is used to add an object to staging, which basically means it is ready to be pushed to remote repository. You can use 'git add .' to stage all the objects in the path. You can also use wildcard characters here. If you don't want certain files to be tracked. You can create a .gitignore file and specify the details there.

git commit - m "<Your Commit Message>" - This is used to commit the changes

git log - Shows commit history with basic details

git branch <BranchName> - To create new branch. Branches are used when you want to do the change without impact the original file. The name of main branch is 'master'.

git checkout <BranchName> - To switch to a branch.

git branch -d <BranchName> - To delete branch.

git diff <SourceBranch> <TargetBranch> - shows difference between source and target branch.

git merge <BranchName> - First switch to master branch. Then issue this command so that all the changes done in <BranchName> will be merged with master.  When changes conflicts between master and <BranchName> occurs, git informs and puts the changes in the object.

git remote - shows existing remote repositories

git remote - v - shows existing repositories with URL.

git clone <RemoteRepositoryURL> - This will pull all the data from remote repository into local folder. Origin is an alias name of the repository cloned.

git fetch origin - This will fetch if any extra changes from remote repository to local. But it doesn't merge the changes.

git pull origin - It fetches if any extra changes from remote repository to local repository and merges the changes from remote branch to current branch.

git push origin master - This command is issued after all the needed changes are committed to finally submit changes to remote repository origin.

git remote add <Alias> <URL> - to add remote repositories.


I found a very nice explanation about git here.  My git repository is here.









Tuesday, 26 September 2017

Investment - Personal Finance

I always wanted to talk about personal finance. This is something I developed deep interest after being educated from a fellow colleague few years back in the memory lane. That realisation will probably never leave me.

I believe on the principle that money is a need in life but life has more to it. I'd advise to pay attention to this subject and establish a good discipline and awareness in the early years of your job.

Invest in yourself. Be confident, try things and fail, until you discover your passion. And you will see, along the journey you would start making money. Keep also your health in your priority list.

Build a conservative approach of spending what is left after saving. In the long run, this will pay off better. Remember the earlier you start, the better it is, relying on the magic of compounding.

There are multiple options to Invest like Bank FD's, Savings Account, Gold ETF, Bonds, Stocks, Mutual Funds, Tax saving instruments like PPF, NPS etc

If you are a salaried employee, the first step should be to start saving from the early years of your career and build an emergency fund. This amount should always be liquid, so should be kept in savings account or flexible sweep-in FD's. You should have a minimum of 6 months of your living expenses in your emergency fund.

Next, educate yourself to invest in mutual funds or directly in stocks. This is necessary to beat inflation, as investment in mere Bank FD's cannot cope up with time value of money. Start with less money, you will make mistakes no doubt, but learn from them to move on. Over the long run, you would have made your money grow better than it would be in Bank Savings or FD. Never invest aggressively in Equity, never try to time the market.

Invest some part of your savings into Bank RD or FD. For a young salaried employee, this could be a small portion here and more into stocks. As you grow older, you should increase your savings portion in FD's and reduce in Equity. This is to ensure that your capital is safe.

Invest in tax saving schemes like PPF or NPS to avail tax benefits. I'd advise to open PPF account from the first year of your career,  due to a lock in period of 15 years.

Once you have saved a good amount of money, try and invest in real estate. Real estate investments are more conservative but requires a huge capital.

All the best. The key is to start early and remain disciplined.

What good is an investment if it doesn't let you sleep at night ? Always remember to smile, be generous and enjoy your life.


This post is a maiden attempt to thank my mentor (Sapan Gupta), a many hundred times for providing the spark and encouragement in the field of Personal finance.



Saturday, 2 September 2017

Statically typed vs Dynamically typed Languages

Statically typed or dynamically typed are basically features of the programming language.

A statically typed language checks for type checking at compile time, whereas in the dynamically typed language the type checking is done at run time.

Some statically typed languages are C, C++, Java etc, whereas Python, Ruby and Perl are dynamically typed languages.

In statically typed languages, bugs due to variable type declaration can be detected earlier at compile time. Whereas in dynamically typed language, these bugs are only detected at run time.

For example if you try and add a string to a number  in C++, the program wouldn't compile and throw error upfront

'2' + 7  would throw error at compile time in C, C++ or Java

Whereas in Python the same code doesn't throw error until and unless run time execution has reached that particular piece of code in question.

Wednesday, 5 April 2017

Clock Angle Problem

While reading the bible for programmers 'Cracking the Coding Interview', I came across an interesting problem of determining the angle between the hour and the minutes hands of an analog clock, given a time.

This drew my attention and I began looking into it. Here is a small explanation about the approach to solve this problem.

What we would need to do is to calculate the rate of change of angle in degrees per minute. The first step to take is to recall that hour hand can turn 360 degree in 12 hours ( that is 12 * 60 = 720 mins)

Rate of change of angle for the hour hand is = 360/720 = 0.5 degrees per minute

Similarly, the minute hand of the clock turns 360 degrees in 60 minutes
So rate of change of angle for the minute hand is = 360/60 = 6 degrees per minute

For a given time X hrs and Y mins

Angle for hour X = 0.5 * ( X *60 + Y)
Angle for minutes Y = 6 * Y

We want to calculate the angle between the hour and the minute hand. So we should be interested in calculating the difference between Angle for hour X and Angle for minutes Y

Difference of Angle between X hrs and Y mins is Z = 0.5 * ( X *60 + Y)  -  6 * Y

The angle between the hour and the minute hand will be the smaller of (360- Z,  Z)


Sunday, 2 April 2017

Getting things done - David Allen

Getting things done or GTD is time management method which are proven successful.
Some quotes from GTD are as below.

#Mind is not for storage. It is to create/have great ideas.

#To have Mind like Water. Be appropriately engaged.

#Carve out mental space.


Important steps in GTD System:

1. Collect
2. Process
3. Organize
4. Review
5. To do 

Integration Broker in peoplesoft

This is an attempt to write in simple words about Integration broker in PeopleSoft.

Integration Broker in short IB, is a tool provided by PeopleSoft to interact between different systems both internal (PeopleSoft systems) and external third party systems.

The interactions can be asynchronous (fire and forget) or synchronous (suspend activity and wait for resource). It is used to invoke PeopleSoft business logic as a web service to outside world or vice versa.

IB has two components : IB Gateway and IB Engine

IB Gateway resides in web server and takes care of  receipts and delivery of messages among systems, whereas Integration Engine resides in App server and takes care of sending/receiving messages to Integration Gateway. It doesn't communicate directly with outside world.

Setting up IB :

1. Activate publish/subscribe processes.
This can be done by starting PSADMIN found in [PS_HOME]\appserv\psadmin

It is a Command line utility, which gives you the option to start pub sub servers.

Pub sub servers are needed for asynchronous messages and not synchronous messages.

2. Configure Gaeway to establish link between webserver and appserver
Here you can configure and choose the gateway which will be used for outbound traffic.

This is done by putting correct server and port in the url in gateway page of IB setup. Also connectors can be loaded here.
Navigation: Peopletools >> IB >> Configure >> Gateway

3.  Configure Node
Here you can configure node which should connect to Integration Gateway.
Provide authentication userid and password. Default user is PS
Setup Gateway properties

In the property file, log level can be set to 5. Verify that password is encrypted or substitute with encrypted password.

4. Setup web service configuration

Here you can setup namespace and target location. Information entered here will be used during setting up XSD's and WSDL.

5. Activate IB Domain
Activating pub/sub processes in IB

Peopletools >> IB >> Configuration >> Quick Configuration

Inside Domain status, perform below actions
click purge domain status
select all domains active
click update

6. Configure Internal nodes

Internal nodes like Anonymous and WSDL_NODE can be configured to put PS as default user id
This completes IB setup.






Word Anagrams Project - Python

Objective : 
To find Anagrams words in English dictionary. Two words are said to be anagrams if the letters can be rearranged to turn one word into another.

Example stop and pots are anagrams to each other.


Steps:
word = open ('words' , 'r')  -- Opens the word file

wordlist =  word.readlines() -- Reads the lines and put them in list

len(wordlist) -- Count the elements in the list

-- Get rid of newline character and lowercase the words

wordclean = [wordlist.strip().lower() for word in wordlist]

-- There still may be duplicates. Use set to remove duplicates
-- Set can have only one instance of any given object
-- and then convert back to a list

wordunique = list(set(wordclean)) -- Duplicates will be removed now, but sorting would be lost

wordunique.sort() -- sort the list

-- The above data transformation could have been done using list comprehension in easier way

wordclean = sorted(list(set([word.strip().lower() for word in open('words' , 'r')])))

-- Finding Anagrams

-- for anagrams , use the fact that sorted(pots) ==  sorted(stop)

def  signature(word):
    return ' '.join(sorted(word))

def anagram(myword):
    return [word for word in wordclean if signature(word) == signature(myword)]

-- Test it

anagram('dictionary')

-- This solution however is very costly, check with %timeit to see how long it takes

-- Need to find faster way to get anagrams for all the words in the file.
-- Creating signature is expensive if we keep repeating for every word in the file

-- A good idea would be to create a python dictionary of all the words indexed by their signature
-- Then getting an anagram of a given word would be as simple as looking inside the dictionary

-- Every item in dictionary would have signature as the key and a list of words as values
-- Loop through the list of words and append to the right item with signature as the key.

words_bysig = { }

for word in wordclean:
    words_bysig[signature(word)].append(word)

-- You may get an error when it tries to add an item 'a' that there is no such item in the dictionary

-- We make use of collections module where there is a default dict object
-- It provides a default value if we try to get a key which doesn't exist.
-- The default value what we want is empty list

import collections

words_bysig = collections.defaultdict(list) -- yields an empty list as the default value


for word in wordclean:
    words_bysig[signature(word)].append(word)

-- Now find anagrams by simple dictionary lookup

def anagram_fast(myword):
    return words_bysig[signature(myword)]

-- Test it


anagram_fast('dictionary')

-- get all the anagrams in the dictionary excluding the trivial ones where the word is an anagram of itself

-- build this using dictionary comprehension

anagram_all = {word : anagram_fast(word) for word in wordlist if len(anagram_fast(word) > 1}

-- Test it with %timeit, it should be relatively very fast