Pages

Wednesday, November 6, 2013

Good Luck Finding A Data Scientist

Good Luck Finding A Data Scientist

Challenge: With every company focused on big data, finding data scientists to manipulate, crunch and make sense of petabytes of data is nearly impossible. Moreover, most universities are only beginning to launch data science graduate programs, so candidates with degrees in the field are still a few years away, at best.

Why It's Important: Financial firms have been investing heavily in big data technologies for the past few years. The term big data is definitely overused, but there is no doubt that business and technology leaders are banking that big data will help provide analytics for a variety of needs in the very near future, including regulatory reporting, client targeting, trading strategies, portfolio management and more. Unless financial firms want their big data investments to sit idle, it is critical to find data scientists and analysts who are proficient enough in statistics to use big data technology effectively.

"Finding big data talent is difficult, retaining it is nearly impossible." -- Dr. Usama Fayyad, Oasis500

Where The Industry Is Now: Finding data professionals or data scientists, unfortunately, is no easy task. "Finding big data talent is difficult, retaining it is nearly impossible," said Dr. Usama Fayyad, chairman of Oasis500, and former CDO at Yahoo! "And the role of data scientist is impossible to fill, especially outside of the US."

Fayyad's comments are alarming, especially since Gartner estimates there will be the need for 4.4 million big data jobs by 2015, with almost 2 million of those positions in the U.S. And forecasts from the McKinsey Global Institute on the availability of big data talent are just as foreboding. The U.S. will face a shortage of approximately 190,000 data scientists and 1.5 million analysts and data managers by 2018, according to McKinsey.

[For learn more about all of the topics that will shape the business technology landscape next year, download the November Digital Issue: Capital Markets Industry Outlook 2014.]

Unfortunately, universities are not yet up to the task of producing large numbers of data scientists. Only a few schools have degrees that focus on the data sciences. For instance, Stanford offers online courses for data mining and statistics. The University of California at Berkeley offers a Master of Information and Data Science (MIDS), but the program is only starting up in January 2014. It will take most students about a year to complete the MIDS course, but keep in mind that many enrollees will likely have full-time jobs and will only take the courses part time, lengthening the time until they graduate.

Focus In 2014: With it being hard to find data scientists on the open market, and relief from the university level a year or more away (at best), financial firms will need to invest in training and programs for employees who are interested in in data analytics. "It is difficult to recruit people right out of school for these positions," said Thomas Statnick, global head, Treasury and Trade Solutions Technology at Citibank. "For big data, it is a hybrid role ... business and technology. You have to train them or recruit them from business positions. It takes business, engineering and computer science skills. It is an interesting hybrid of a position, but it is really hard to find those people."

Industry Leaders: Competition is fierce for data scientists, to say the least. As a result, most data experts are demanding a premium on the open market – especially those who have business skills and understanding of a particular business segment. Companies such as PayPal, Google, eBay and Amazon are renown for their use of data and they are always recruiting data scientists.

In financial services, a number of firms have established data science organizations. At State Street David Saul is the Chief Scientist, and he spends a great deal of time focused on big data, semantic databases and other data disciplines. PayPal, Square and Deutsche Bank have all had chief scientist roles in their organizations as well.

Price Tag: Recruiters and hiring managers at financial firms say that finding qualified candidates with data analytics, statistical skills and knowledge of the financial services business is extremely difficult, if not impossible. Those candidates with the skills on their resumes command top dollar in today's market. As a result, many firms are choosing to recruit and train employees internally.

ABOUT THE AUTHOR

Greg MacSweeney is editorial director of InformationWeek Financial Services, whose brands include Wall Street & Technology, Bank Systems & Technology, Advanced Trading, and Insurance & Technology.

See more from Greg

 

Tuesday, June 11, 2013

Start a career in Analytics.. with Excel 2010 http://analyticsindiamag.com/start-a-career-in-analytics-with-excel-2010/

Orginal Source from-  http://analyticsindiamag.com/start-a-career-in-analytics-with-excel-2010/

Talk to people who want a career in Analytics and one of the first questions they will ask is – ‘how can I have a career in Analytics with no access to softwares like SAS, SPSS etc.??  And my answer is – have you looked at excel 2010 and its capabilities w.r.t analytics? If you are like 80% of the population, the answer is ‘no way’. Excel 2010?

Let’s do a quick check of our old Excel in its not so new ‘avatar’ – Excel2010.As a spread sheet , each worksheet can hold 1,048,576 rows of data and 16,384 columns of data, way up from the 60,000 rows and approximate 110 columns in the excel of old.

There are 403 functions and we can now have 255 arguments in a function and nest 64 levels of functions per formula. So, complex customised calculations become easy.  This makes data cleaning and massaging very easy. You have a plethora of text and numeric functions and lots of date related calculations that can be done, which rival softwares like SAS, SPSS etc. Once the data is in the right form for the project, you can add-in the Analysis toolpak. This contains a set of 19 wizard driven statistical processes to use – from Descriptive stats to Annova to Regression. A judicious use of the functions and formulae and the Analysis toolpak will allow you to get nearly all the commonly used outputs for business decision making.

And then the power pivot free download is a great add-in. This increases the BI capabilities of excel multi folds and allows for merging data from various sources and manipulating the data It uses DAX – Data Analysis Expressions – which is a language that enables more complex grouping and calculations and thus, better analysis . (You can read more about this and download it from http://technet.microsoft.com/en-us/library/ff452206.aspx)

On the data visualisation front , the new features of Sparklines – which are graphs in a cell and give a very quick and ready reading of trends – is a very user  and analysis friendly addition . Since these can be used along with pivot tables, the utility is immense. It is very easy to summarise and draw conclusions to segment the reports that you will create.

Recording and using recorded Macros is much more robust in this edition of excel, ensuring that even non- coders (VBA coders) can use Macros for repetitive tasks, formatting and generation of standard codes.  Simplifying and standardising helps you to move to the next level analysis and spend more time in validation and conclusions rather than preparation.

Sounds good? Yes, and feels very good too. Try it out to get a closer look to this old and free application that we have taken for granted.

This use-ability has led to companies building paid add-ins that work on excel and my favourite is XLSTAT. As the name suggests, it simplifies the more complex statistical processes and makes them button driven. So doing a factor analysis, pareto, decision tree, cluster analysis and logistic regression becomes that much more easier.

You, of course, have to remember that excel is a competent tool for analysis but the conceptual understanding has to be built by you. Thus, the success of the project will depend on your knowledge of the subject and ability to use the tool

With an extensive online help and many forums dedicated to it, Excel 2010 has finally ‘arrived’!! I strongly believe in the potential of this software and the easy accessibility just adds to its charms.

So all you potential analysts out there – pull up your boots and get cracking …

 

Saturday, March 30, 2013

Avoiding spreadsheet Hell

Spreadsheets are ubiquitous, but they are also a major source of risk, as high-profile examples have shown. Tom Groenfeldt reports on how firms ensure spreadsheet integrity

The JP Morgan Task Force Report into its Chief Investment Office’s $6 billion-plus loss found the bank’s Value at Risk was being calculated with an Excel spreadsheet that “required time-consuming manual inputs to entries and formulas, which increased the potential for errors”.

At another point the report found “the model operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another”.

JPM is not alone: more than half of c-level executive in financial institutions say they have few or no controls over critical spreadsheets at their firms.

Nearly nine in ten (89%) rely on manual oversight to maintain data integrity, while only 11% report automated controls policy to fully understand changes between different versions of spreadsheets.

The research, conducted by Vision Critical for ClusterSeven, which sells spreadsheet control software, seems to indicate a large gap between executives’ concerns and their actions to manage risk. Just over half (55%) of the top executives rate spreadsheet risk as either serious or very serious.

“Financial services firms, and the senior managers and executives that run them, rely heavily on spreadsheets for much of their business critical processes,” said Ralph Baxter, chief executive of ClusterSeven. “However, there are significant risks associated with this and all stakeholders are now waking up to what these are. Risks include anything from basic ‘cut and paste’ errors to miscalculations, fraud and corrupted files.”

Excel, easily the most widely used tool in finance, creates risk in the financial enterprise. Designed for the desktop, its power and ease of use have led to its widespread use in analytics, trading, modelling and reporting. Excel, designed for individual users, lacks controls, lockdowns and audit trails.

PWC has built an entire consulting practice around Excel, beginning with identification and risk measurement of end user computing (EUC) within a firm. “PwC recognises that EUC‘s are here to stay. Rather than attempting to remove them, we believe that organisations should understand their use and make sure that they are properly controlled,” the company explains. It calls for developing a governance framework and then deciding when and where software should be used to monitor and control spreadsheets.

Implementing spreadsheet management often uncovers material errors in a company’s accounts. In helping one firm move to an ERP system, PWC exported data out of hundreds of spreadsheets and modelled the results, which uncovered “a number of significant errors.”

The consultancy notes that a number of regulations including Solvency II, SOX and Basel III will require improved controls over spreadsheets.

In London, the FSA has required financial firms to manage their spreadsheets, especially those used to feed their capital adequacy modelling under Solvency II, said Mark Allen, head of business intelligence at Canopius Managing Agents, an international insurance and reinsurance group based in London. It turned to ClusterSeven.

“We have approaching 1,000 key business spreadsheets managed under ClusterSeven,” he said. “We haven’t done all of them, and that is too many to be honest: some spreadsheets don’t have material effect.”

“Whenever possible we use SQL for financial reporting systems, but it’s not realistic to replace spreadsheets. While we have our more complex Solvency II data in databases, there will always be the need to pull together information for presentations in spreadsheets.”

Canopius found ClusterSeven didn’t create problems for business users. “Some comparable systems are quite invasive. This just sits in the background and tracks changes. It’s much easier to roll out software when it doesn’t get in the way of users.”

The principle can be extended to other off-the-shelf applications. ClusterSeven offers similar controls for the MS Access Database, which is tucked away in thousands of organisations running vital departmental tasks, often little known to anyone outside the group where it operates.

Another approach to improving on standalone Excel is available from WestClinTech in the US. It has built XLeratorDB which can operate refined Excel calculations, and additional financial calculations, inside SQL Server where they can run up to 100 times faster than Excel and are fully protected by server security.

A commercial real estate financing company in the US with a market cap over $7 billion uses XLeratorDB to improve reporting and projections. When it relied on Excel spreadsheets, individual asset managers in different regions developed their ways of using the models, the firm’s chief information officer said.

“The process was inconsistent, there was no visibility,” he added. If a regional manager failed to update quarterly results in Excel, that would not necessarily be apparent to headquarters. “It only took one or two managers not to update their numbers to realise that we needed to correct this.”

 

Friday, March 22, 2013

CELL Function

The CELL Function is a standard function that will return information about the current operating system. See CELL is Excels help for details.

Formula

Result

=MyName()

Workbook Path.xls

=MyFullName()

C:\OzGrid\Learning\Workbook Path.xls

=CELL("filename")

C:\OzGrid\Learning\[Workbook Path.xls]Sheet1

=sheetname(A1)

Sheet1

 

Tuesday, March 19, 2013

IMP operator

IMP operator

Purpose

The IMP operator works as both a logical and a bitwise arithmetic operator.

Syntax

p IMP q

Remarks

IMP as a logical operator

The IMP operator returns FALSE (zero) if and only if its first operand is TRUE (non-zero), and its second operand is FALSE.  In all other cases, it returns TRUE.

Truth table

x

y

x IMP y

T

T

T

T

F

F

F

T

T

F

F

T

Using IMP as a bitwise arithmetic operator

IMP is seldom used as a bitwise arithmetic operator, but here is a sample:

 

 

The IMP (IMPlies) operator is used when you want to limit access to a certain "privilege". For example, if you have a swimming pool but you don't want anyone entering it when the guard is not present. So you let Q=Guard is Present and P=Pool is accessible. If Q is false (guard is not there), the formula (P IMP Q) will always be false, ensuring safety of pool users.