Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do.
There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Wednesday, 10 February 2016

JOB ADVT DATA MINING

Dear  Deepak :

(  Julia  Computing  )

--------------------------------------------

I write to you as suggested by Viral Shah ( thru Linkedin )

Some 2.5 years back , I had tried to get Ms Rohini Damahe ( Lecturer - L&T  Institute of Technology ) , to take up a DATA  MINING project for her MS studies . But this did not work out

I wonder if Julia Computing would want to do this - as a Service to the Nation

What I have in mind , is explained in the attachment

Feel free to write / phone for any clarifications

with regards,



hemen  parekh

www.hemenparekh.in > Blogs > Towards a National Job Portal
                              > Reports > www.ResumesExchange.com

Marol , Mumbai , India


10  Feb   2016



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Proposal  for  Julia Computing  
From :  hemen  parekh  /  hcp@RecruitGuru.com  /  (M) 0 - 98,67,55,08,08  / www.hemenparekh.in
10  Feb  2016
Mumbai
------------------------------------------------------------------------------------------------------------------------------------------


07  May  2013
------------------------------
Dear Rohini

During our discussions yesterday , you expressed your desire to work on some project involving Data mining
At that time , I mentioned that we have a database of over 5 million job advts , downloaded over the past 6 / 7 years from various job portals of India
Each job advt database consists of :

Ø  Advt ID
Ø   
Ø  Designation ( being advertised )
Ø   
Ø  Company Name
Ø   
Ø  Job Description
Ø   
Ø  Desired Profile
Ø   
Ø  Compensation
Ø   
Ø  Experience ( desired ) – Years
Ø   
Ø  Industry Type
Ø   
Ø  Education
Ø   
Ø  Location ( Posting City )
Ø   
Ø  Keywords
Ø   
Ø  Post Date
Ø   
Ø  Expiry Date
Ø   

Some years back , ( when our website , www.World-Wide-Jobs.com , was up and running ) , we had developed a feature to analyze this database and display the findings visually , in different ways
We were displaying PIE-CHARTS of :

Ø  Industry-wise Jobs

Ø  City-wise Jobs

I attach a Sample

You will observe that , with a much larger database available now , it is possible to analyze / display the “ No of Jobs “ , in many more ways
Not only that , it should be possible to analyze this huge database to predict the future expected PATTERN of the occurrence of jobs , in many different ways !
Beyond that , it should be possible to evolve some sort of an EXPERT SYSTEM , by extracting patterns that tell us ,
Ø  IF this , THAN that
Ø  IF this , THAN not that
Type of DECISION RULES !
I have already written down a few such possible RULES , that I can send you later , in case you wish to take up this project
If you do , I can even prepare U/I of a web page that will enable any visitor to search such co-relations amongst various data fields
Considering that , currently , we are downloading approx 1,000 job advts EVERY DAY , this would refine and improve as time goes by
Pl let me know in case of interest

hcp
---------------------------------------------------------------------------------------------------------------------------

07  June  2013

Rohini

At any given time , the number of jobs getting advertised , is an important Economic Indicator
If economy is booming and company Order Books are getting fatter , then more jobs will get advertized – and vice-versa
Hence , a time-series analysis of the no of new jobs getting posted on job portals , has a  straight line relationship with the state of the economy ( a high co-efficient of correlation )
Apart from that , can a Data mining of 5 million jobs , answer ( even partially ) , the following questions ?

Ø  Who ( which Companies ) are advertizing and when

Ø  What jobs / vacancies / positions are being advertized

Ø  What is the frequency with which a particular job gets advertized ? By entire industry ? By a given Company ?

Ø  Which regions / cities have max / min no of new jobs

Ø  What are regional disparities due to

Ø  Which Industries are advertising most – creating most jobs

Ø  What Edu Qualifications are in max demand

Ø  What kind of jobs demand what kind of Edu Qualifications

Ø  What is the level of co-relation between , Position and the years of Experience demanded

Ø  For identical positions being advertized , how much do “ Job Descriptions / Desired Profiles “ differ, from company to company

Ø  Are there significant differences in the “ No of years of Experience “ being demanded , for identical positions

Ø  What is the probability of finding the “ Keywords “ in “ Job Description / Desired Profile “

Ø  What is the extent of duplication ( redundancy ? ) between , “ Job Description “ and “ Desired Profile “

Ø  What percentage of Advts fail to make any mention of , Compensation Offered

Ø  When a company posts an advt for same / identical position , at different points of time , are there any differences in values ( fields )

Ø  From an analysis of all the advts posted by a given Company ( over past 7 years ) , can any conclusion be reached as to the changing nature of that company’s business ( by co-relating the “ Skills related Keywords “ )

Ø  Can the algorithm predict what job a company will advertize next – and when

Ø  Is there any correlation between , “ Designation / Position “ and the “ Keywords “

Ø  From analyzing this huge data , can software auto-generate , a complete / editable job advt , as soon as a Recruiter simply types the “ Designation / Position “

I believe , so far , no one has undertaken such a Data mining project

If carried out diligently , I am sure , the outcome would be of immense benefit to :

Ø  HR Managers……………….. ( for Manpower Planning / Compensation Planning )

Ø  Recruiting Managers…………( for framing Man Specifications / Job Description Manuals )

Ø  Educationists…………………( for deciding what Edu Quali are in demand and tailor the Courses )

Ø  Students ……………………..(  to figure out what “ Skills “ are in demand by Industry and prepare )

Ø  Planning Commission………(  for allocating Resources to States / Regions , based on imbalances )


Ø         HRD Ministry ………………….(  For long term Macro-Planning in respect of Education )

Ø  National Skills Development Commission ………( for chalking out Skills Development Programs in collaboration with Companies / Industries )

If undertaken – and executed seriously – then this Data mining project has the potential to place LTIT on the Centre-Stage of National Education Planning Scenario
I do hope , you will consider my proposal sincerely


Regards

hcp
----------------------------------------------------------------------------------------------------------------------------

15  July  2013

Rohini

I am glad that you liked the idea of Data mining of 5+ million job advts
What can / will such a project yield ?
Without exaggerating , it would be safe to assume that , this vast database of job advts would contain :
Ø  50 million phrases / sentences
Ø  500 million words

Obviously , each word / phrase / sentence , is nothing more than a “ Database of Intentions “ of the Employer Companies ( to borrow from John Battelle’s well-researched book about Google )
Our goal shall be to make this ( Data mining Algorithm ) a dynamic / continuous “ Process “ , so that , we can measure the changing nature of these “ Intentions “ , over a long , long period
And we must enable a “ Researching Visitor ( of our web site ) “, to benefit from these trends / patterns
If your Guide approves of this project , we will sit down to draw up a plan , along with Shuklendu
In the meantime , I attach a very small list of the Words / Phrases / Sentences , that I had manually compiled some 16 years ago
Regards

hcparekh 
-------------------------------------------------------------------------------------------------------------------------------

18  July  2013


GOOGLE  N-GRAM  PROJECT
Graph these case-sensitive comma-separated phrases: [          ]
between [          ] and [          ] from the corpus [English \/] with smoothing of [3 \/].
[Search lots of books]
Search in Google Books:
English
English
English
Run your own experiment! Raw data is available for download here.

--------------------------------------------------------------------------------------------------------------------------------------------

18  July  2013


Rohini

Even though 5 million job advts may contain 500 million “ words “ , these are not Unique

Most of these are used again and again , hundreds or thousands of times

Thru data mining , it is not difficult to compute their “ Frequency of Usage

And then , these frequencies can be graphically plotted against any particular time-period

Such Graphical Representations can be further broken up by ,

 

Ø  City Names

 

Ø  Company Names

 

Ø  Industry Names

 

Ø  Function Names

 

Ø  Designations ( Vacancy Names ).. etc

 

And such graphical analysis can be done , not only for “ Keywords “ but even for “ Key Phrases “ and “ Sentences “ !

Regards

 

Hcp

-------------------------------------------------------------------------------------------------------------------------------------------------
22  July  2013

Rohini


Take a look at this project paper

It is all about data mining of some 150 million records ( location points ) and about uncovering “ trends / patterns “ of physical movements of 300 human volunteers , over a “  period of time  “

I quote from article in Times of India ( 19 July 2013 ) :

“ ..the first system of its kind to predict long term human mobility in a unified way , parse the data. Far Out does not need to be told exactly what to look for  --- it automatically discovered regularities in the data “

“ Do you know precisely where you’ll be 285 days from now at 2 pm ?

Researchers have developed a new tracking software that can tell you exactly where you will be on a precise time and date , years into the future

What we want to do with 5 million job advts database , is quite similar – viz ; predict WHO ( which Company / Industry ) , will advertize WHAT ( vacancies / positions / designations ) and WHEN ( time )

It is do-able !

Regards

hcp
-----------------------------------------------------------------------------------------------------------------------------------------------------------

31  July  2013

Rohini



No problem

Based on all the emails that I have sent so far , you should prepare an outline of the Data mining Project

That paper would help all of us to know , in advance , what to expect when the project gets completed ( hopefully , by Dec 2013 ? )

As explained to you over phone yesterday , this “ Data Mining and trend / pattern generation “ must happen online on www.CustomizeResume.com

Quite likely , we , currently have some 3 million job advts in CustomizeResume web site

By a copy of this email , I am requesting Shuklendu , to add to this , another 3 million job advts which are available in www.IndiaRecruiter.net web site

And , since this database ( of job advts ) keeps growing at approx 1,000 per day , the software that you develop and install on CustomizeResume web site should be such that , the trends / patterns / search results etc , are generated dynamically / on-the-fly , any time a visitor selects any given,

Ø  Search Criteria ( Industry / Company / Position / Time Period etc )

Ø  Tabular or Graphical Display ( graphs are critical to visualizing trends / patterns )

If there are any questions , feel free to phone me

I hope , you could talk to Shuklendu re your technical queries

Regards

Hcp


CC: Shuklendu

We should seriously consider reviving , www.IndiaRecruiter.net

About a year back , while talking about this ( revival ) , Nitin mentioned that , it may take 2/3 hours to “ connect-up “ the software code ( available with you , in the back-up taken at Reliance Server Farm ), with the databases of IndiaRecruiter

-----------------------------------------------------------------------------------------------------------------------------------------

02   Aug   2013

Shuklendu


Ø  Demand of the Project

I do not understand what is meant by “ Demand “ !
I presume , this has nothing to do with the “ Market Demand “ – as for a product or a service
If , what is meant is , what is the “ Object “ or “ Purpose “ of this project , then , that has been amply explained in my 4/5 earlier emails ( with copies to you ) sent to Rohini
Very briefly stated , the purpose is for the software to be able to “ Predict “ , WHO ( which Company or which Industry ) will advertise for WHAT ( Vacancy / Position / Designation ) and WHEN ( specific time in future )
The software will accomplish this by examining / analyzing millions of Job Advts thru PARSING / INDEXING its contents and graphically plotting Trends / Patterns , along a “ Specified Time Axis “

The Contents are :

·        Every field of a Advt

·        Millions of Sentences / Phrases / Keywords , contained in those advts and computing their Frequencies of Occurrences


Ø  Deadline

Dec 2013


Ø  Software Tools / Languages to be used

You are best placed to advise Rohini re this .

From “ Availability to the Users “ point-of-view , this project / feature must work on CustomizeResume web site . It will be freely available to both , Employers as well as Jobseekers – and without login

Being web site – based , it must dynamically accommodate the inflow of 1000+ job advts getting added to Jobs Database daily.

This is NOT an Enterprise based TOOL

It can be demonstrated directly from the web site only


Ø  Technical Help

I hope you / Nitin can guide Rohini , whenever required , as far as integration with Job Advt Database is concerned.

I have a strong belief that what Rohini develops , will be of immense help to our own team in developing our “ Job Recommendation System “ ( for which , you already have with you ,

·        A folder containing my various handwritten notes

·        Several past emails , laying down the precise logic

We should , jointly monitor this project , once-a-fortnight , in a face-to-face meeting with Rohini
She should continue to work from LTIT premises ( - unless , you want her to sit at Sentient premises )

hcp


--------------------------------------------------------------------------------------------------------------------------------------

02  Aug  2013

Hello Rohini,

In consultation with Parekh Sir, these are the responses to your queries:

What is the Demand of the Project
I take it that you do not want to know the ‘Market Demand’ of the project (as it is irrelevant for an ME project), but ‘What is Demanded of the Project’. Parekh Sir has already explained the Objective & Requirement of the project in detail in his mails.
In summary, he had said
the purpose is for the software to be able to “ Predict “ , WHO ( which Company or which Industry ) will advertise for WHAT ( Vacancy / Position / Designation ) and WHEN ( specific time in future )
The software will accomplish this by examining / analyzing millions of Job Advts thru PARSING / INDEXING its contents and graphically plotting Trends / Patterns , along a “ Specified Time Axis “

The Contents are :

·        Every field of a Advt

Millions of Sentences / Phrases / Keywords , contained in those advts and computing their Frequencies of Occurrences


Deadline of the Project
December 2013

Software Tools and Language
This project is to be part of our existing site www.customizeresume.com. Therefore the same software platform is to be used, which is
·        ASP.Net 3.5
·        MS SQL Server 2005
·        C#
So, you can use Visual Studio 2008 or Visual Web Developer Express 2010. For SQL Server, you can use SQL Server Management Studio.


Place of Development
You can develop it from any place convenient to you, e.g. LTIT or Institute where are doing ME or Home. You may visit our office in Malad for discussion, showing what you have done, trouble shooting, etc.


Demonstrating it Outside
Once the project is approved by Parekh Sir, it will go online on www.customizeresume.com. So, it will be in public domain and anyone can see it. You can send the link to anyone you want to demonstrate to. Parekh Sir is very generous in giving credit where due, so i am sure he will give due credit to you for your efforts.


Who will give Technical Help
You can contact me or my colleague Nitin Ruge for technical help. Nitin can be contacted at nitin.ruge@sentientsystems.net or 022 42666657.


Hope this answers your queries.

Regards
Shuklendu Baji
---------------------------------------------------------------------------------------------------------------------------------------------------

07  Aug   2013

Rohini

Pl ignore my earlier email of today morning – which , I had sent without looking at this
Anyway , Shuklendu’s answers to your queries are satisfactory
During one of our meetings , I had also talked to you about developing an “ Expert System “ , thru discovery of specific “ Co-relations “ amongst various Data Fields of 5 million job advts
Eg :
Ø  What is the Co-relation between , any given “ Designation / Vacancy-Name / Advertized Position “ and “ Educational Qualifications “ ?
Here are some examples :
Ø  Any designation  such as “ Production Manager “ would call for an “ Engineering Degree / Diploma “ ( but never a CS / CA )

Ø  Any designation in “ Finance Function “ will require,
·        B Com
·        M Com
·        CA    etc
       But never a BE(M ) / BE (Chem )

Ø  Any designation at Manager level will call for a minimum experience of 5 years ( but never a Fresh Graduate with NIL experience )

Ø  MBA / BBA / MMS etc are the most preferred Edu Qualifications for positions in Marketing


Ø  No vacancy in an Automobile Manufacturing Company , will call for a degree in Pharmaceutical

Ø  No Electrical Machinery Manufacturing company will ever demand a Medical Degree (MBBS )

To a human mind , these ( rules ) are SOO OBVIOUS !
But , no human mind can write-down ALL of such RULES , in 2 minutes ! – something that your Data mining Software can – and will – do in 5 seconds !

All that you need , after computing “ Frequencies of Occurrences “ , is to :
Ø  Plot the Co-efficients of Co-relations between various Fields ( of job advts )

Ø  Compute Probabilities for each and create hundreds of Probability Tables

And , since a thousand new job advts are getting added to our Job Advt Database , daily , the SAMPLE SIZE is perpetually increasing – thereby , increasing the Accuracies of your Predictions !

Having done this , imagine the following scenario :

Recruitment Officer of Wipro , comes to our “ Post Job “ page and , in the field for “ Designation “ simply types ,
Business Analyst
And Presto !
The entire Job Advt Form gets auto-filled , with MOST PROBABLE values !
Would not that amaze her ?
All that our software has done is analyzed job advts of all “ Software Companies “ ( an Industry ),– and of WIPRO – for the position of Business Analyst and filled in the most probable values
This is no rocket science !
We had actually , partially attempted it – albeit in a crude way – in our earlier web site , www.IndiaRecruiter.net
What surprises me is , how come no one has attempted this so far ! Especially , Naukri / TimesJobs / MonsterIndia , who have accumulated millions of job advts !
Any way , the fact that they have , so far , ignored this “ Line of Examination “ , will work to your advantage – making YOU the very first person in the entire world to come up with a PREDICTION MODEL in the area of JOBS
Let us keep our  HORIZONS way wide
hcp
            
--------------------------------------------------------------------------------------------------------------------------------------

27  Aug   2013

Rohini

I refer to our telecom today morning
From what I understood , your guide would like to know the “ Prior Knowledge “ in this area ( Research Papers )
I did some searching on Google and came across the following
All of these may not be directly / immediately relevant to our project , but these are worth going thru
You may even short-list 3 or 4 , that you may wish to submit to your guide
Next question ( of your guide ) is also very relevant , viz ;
“ What do you expect to achieve thru this project ?  How will it benefit Jobseekers / Employers / Edu Institutions / Policy Makers etc ? “

From my own past experience ( of designing 8 web sites over last 16 years ) , I have found that , this question cannot be satisfactorily answered by writing a long and esoteric description !

The best way to answer is to conceive / design / display , the User Interface  !
That alone can force you to answer :
“ When a visitor arrives on this web page , will he easily understand what he can select / click ?
Will he intuitively expect , what ( texts / figures / graphs ) will he get to see when he takes any action ?
Pl do prepare U/I and that would convince your guide !

hcp

http://www.cnts.ua.ac.be/papers/2000/extract00.pdf
http://edpath.typepad.com/source_scholars/2013/02/does-labor-market-intelligence-software-along-with-spidering-and-data-mining-of-online-job-advertise.html
http://www.trainingindustry.com/blog/blog-entries/new-providers-of-data-collection-and-analysis-of-online-job-ads-in-real-time-may-be-helping-community-colleges-create-better-curricula.aspx
http://www.youtube.com/watch?v=6LeUiFcfpyw
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5232798&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F52%2F5370751%2F05232798.pdf%3Farnumber%3D5232798
http://www.questia.com/library/1G1-277519055/changing-trends-in-lis-job-advertisements
http://lexicometrica.univ-paris3.fr/jadt/jadt2012/Communications/Fioredistella%20Iezzi,%20Domenica%20et%20al.%20-%20Text%20clustering%20based%20on%20centrality%20measures.pdf
https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=38&cad=rja&ved=0CFwQFjAHOB4&url=http%3A%2F%2Fwww.lirgjournal.org.uk%2Flir%2Fojs%2Findex.php%2Flir%2Farticle%2Fdownload%2F499%2F548&ei=aDgcUrW8Osf_rQe_2YDQDg&usg=AFQjCNFLjH3CRQgJe7DmX8sfiO1Ju7rUIA&sig2=N7ObhwbL-Qmf5YRBYOAVRA&bvm=bv.51156542,d.bmk
http://books.google.co.in/books?id=ImJPbmcgF4wC&pg=PA590&lpg=PA590&dq=%22data+mining%22+%2B+%22job+advertisements%22&source=bl&ots=nowszCkygZ&sig=-IC9g67l6q9HqY9eC2H_LHn3VYE&hl=en&sa=X&ei=nDwcUsrAENCmrAemwgE&ved=0CEMQ6AEwAThG#v=onepage&q=%22data%20mining%22%20%2B%20%22job%20advertisements%22&f=false
https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCsQFjAA&url=http%3A%2F%2Fwww.comp.hkbu.edu.hk%2F~william%2Fpapers_slides%2Fcheung_am_icdm02.ps&ei=-T0cUrHlLMfIrQfP9oCwDQ&usg=AFQjCNHwvqNLJP3WLpniR4VX348B6BXG4w&sig2=dUVpsLN0tXXowSTrAgwjVQ
http://etjanst.hb.se/bhs/ith/2-99/ja.htm
http://codecamp.fi/lib/exe/fetch.php/wiki/a_tool_for_visualizing_skill_requirements_in_ict_job_advertisements---preprint.pdf

http://www.bloomberg.com/news/2013-04-03/algorithms-play-matchmaker-to-fight-7-7-u-s-unemployment-jobs.html ( quite interesting ! –hcp )

http://it.vtc.edu.hk/itjobanalysis/
-------------------------------------------------------------------------------------------------------------------------------------------

01  Sept  2013

Rohini 

Following is a list of links from the first page of Google , when you type search term :
Download Data Mining Software
Altogether , there were more than 2 Million results !
Nearly all of these can be downloaded for FREE
After examining , you can decide if any of these can be gainfully employed for our project . If so , go ahead and download
Nothing can be discovered without “ experimenting “ ! Remember , too much of “ Analysis “ , often leads to “ Paralysis “ ! It is important to “ Get Going “ !
In the meantime , I hope you have gone thru the links that I sent to you earlier
In those , could you find any Research Papers that you want to submit to your guide ?
It has been over 6 weeks since we started talking about this project . It is high time we put this in “ Second Gear “ !
hcp

---------------------------------------------------------------------------------------------------------------------------------------------
02  Sept  2013

Rohini

Take a look at the counter for “ Live Jobs “ on ,

http://www.customizeresume.com/Jobseeker/JobSearchConventional.aspx

Today , it reads …. 14,861
Some 6 months back , it read , approx 30,000
This counter is constructed from Jobs RSS Feeds from , Naukri / TimesJobs / MonsterIndia and ClickJobs
Hence , it is fairly representative of the job market in the organized sector
It would have been a fairly simple exercise to plot the daily figures in a graph to reveal the gradually declining no of jobs being advertized
That would not require use of any Data Mining tool
However , without applying some simple data mining tool , it would not be possible to answer the following questions :

Where is the greatest decline of jobs being advertized ? How much is the percentage decline ?

Ø  In which Industry ?

Ø  In which Company ?

Ø  In which City ?

Ø  In which Region ?

Ø  In which Skills ?

Ø  For which Positions ?

Ø  For which Education Levels ? ………… etc

With a data mining tool , such individual graphs could emerge ( within fraction of a second ) at the click of a button !
One could even co-relate these graphs with other , publicly available statistical data such as :

Ø  IIP ( Index of Industrial Production )

Ø  Stock Market Index

Ø  Currency Exchange Rate ( eg; declining Rupee )

Ø  Decline in GDP / Increasing Fiscal Deficit

Ø  CAD ( Current Account Deficit )

Ø  Foreign Investments

Ø  Primary Bank Rates of RBI…………………………….etc


With proper co-relations , one could even predict how much the job market will further shrink , over the next 6 months !
Such” Predictive Model of Job Market “, would be of immense interest to , not only the economists but also to the HRD Ministry / Planning Commission / Educational Institutions and of course the students themselves
I believe you could now , accelerate the pace of your project
I await to hear from you

hcp
------------------------------------------------------------------------------------------------------------------------------------------

04  Sept   2013

Rohini

Thank you for your email , mentioning that tomorrow you will let us know the exact status of the project
In the meantime , you may want to look up the following
InMobi Ad Network delivers every month , billions of Advts on millions of mobile phones in 165 countries
This portal shows some interesting methods of graphically displaying  their findings , on a continuous / dynamic manner
You may , well , consider this ( portal ) to be THE LARGEST DATA MINING project ever undertaken ( barring , possibly , Google Analytics )
Although our project is very small ( only 5 million job advts and only approx 1000 added every day ) , we , too , should be able to present our analytics / findings in beautiful / meaningful graphs
And in our case , we want the visitors ( to our web site ) themselves , to be able to select any Search Parameter and be able to generate the graphs on-the-fly
hcp
 http://www.inmobi.com/hstar/netres/netres.php?country=India&month=6&year=2013&cmonth=3&cyear=2013
--------------------------------------------------------------------------------------------------------------------------------------




























No comments:

Post a Comment