**DATA ANALYST Interview Questions for freshers experienced :-**

**1. What is involved in typical data analysis?**

Data analysis involves collection and organization of data, correlation between analyzed data and the rest of the company and market, and the ability to then creatively think of solutions to existing problems, or point out problems and initiate preventive measures.

**2. Do you have any experience with corporate websites?**

Relevant experience might involve creating the rules for various corporate consumer websites, depending on market conditions and company goals. Analysts might also help design website features, so that they align with their analysis of company data and performance predictions.

**3. How do you handle database as a data analyst?**

Data analysts may be responsible for database design and security. They are responsible for upgrading the database on a regular basis so that it meets the demands of the market and company needs. And they make sure that the database runs smoothly in general.

**4. Are you comfortable spending prolonged periods of time in front of a desktop computer?**

All data analysts spend much time using computers. The job requires no physical strain except the ability to focus on the job of analyzing and offering creative solutions.

**5. Have you ever developed any programs or products relevant to your profession?**

As a business analyst working in the software and technology industry, you might have experience actually developing information-related products for the company, which can be used for data analysis and data presentation.

**6. What are your best traits that are suitable for this position?**

Name relevant skills including one’s:

- Excellent analytical skills
- Exceptional communication skills
- Outstanding writing abilities
- Orientation to detail
- Advanced knowledge of current programs such as Microsoft Access and Excel
- Demonstrated problem solving abilities

**7. Describe a time when you solved a problem that benefited your company.**

Though providing analyses may always be helpful in one way or another, the interviewers are looking to see if you have done anything to directly contribute to a past company’s noteworthy success. You must then:

- Talk about how you understood the origins of the problems
- Indicate that you were able to delve into the details and fix what was necessary
- Discuss the desired results
- Conclude with mentioning how your positive, optimistic, and focused attitude assisted in resolving such an issue

**8. Can you outline the various steps in an analytics project?**

Broadly speaking these are the steps. Of course these may vary slightly depending on the type of problem, data, tools available etc.

**Problem definition –**The first step is to of course understand the business problem. What is the problem you are trying to solve – what is the business context? Very often however your client may also just give you a whole lot of data and ask you to do something with it. In such a case you would need to take a more exploratory look at the data. Nevertheless if the client has a specific problem that needs to be tackled, then then first step is to clearly define and understand the problem. You will then need to convert the business problem into an analytics problem. I other words you need to understand exactly what you are going to predict with the model you build. There is no point in building a fabulous model, only to realise later that what it is predicting is not exactly what the business needs.**Data Exploration –**Once you have the problem defined, the next step is to explore the data and become more familiar with it. This is especially important when dealing with a completely new data set.**Data Preparation –**Now that you have a good understanding of the data, you will need to prepare it for modelling. You will identify and treat missing values, detect outliers, transform variables, create binary variables if required and so on. This stage is very influenced by the modelling technique you will use at the next stage. For example, regression involves a fair amount of data preparation, but decision trees may need less prep whereas clustering requires a whole different kind of prep as compared to other techniques.**Modelling –**Once the data is prepared, you can begin modelling. This is usually an iterative process where you run a model, evaluate the results, tweak your approach, run another model, evaluate the results, re-tweak and so on….. You go on doing this until you come up with a model you are satisfied with or what you feel is the best possible result with the given data.**Validation –**The final model (or maybe the best 2-3 models) should then be put through the validation process. In this process, you test the model using completely new data set i.e. data that was not used to build the model. This process ensures that your model is a good model in general and not just a very good model for the specific data earlier used (Technically, this is called avoiding over fitting)**Implementation and tracking –**The final model is chosen after the validation. Then you start implementing the model and tracking the results. You need to track results to see the performance of the model over time. In general, the accuracy of a model goes down over time. How much time will really depend on the variables – how dynamic or static they are, and the general environment – how static or dynamic that is.

**9. What do you do in data exploration?**

Data exploration is done to become familiar with the data. This step is especially important when dealing with new data. There are a number of things you will want to do in this step –

- What is there in the data – look at the list of all the variables in the data set. Understand the meaning of each variable using the data dictionary. Go back to the business for more information in case of any confusion.
- How much data is there – look at the volume of the data (how many records), look at the time frame of the data (last 3 months, last 6 months etc.)
- Quality of the data – how much missing information, quality of data in each variable. Are all fields usable? If a field has data for only 10% of the observations, then maybe that field is not usable etc.
- You will also identify some important variables and may do a deeper investigation of these. Like looking at averages, min and max values, maybe 10th and 90th percentile as well…

e. You may also identify fields that you need to transform in the data prep stage.

**10. What do you do in data preparation?**

- In data preparation, you will prepare the data for the next stage i.e. the modelling stage. What you do here is influenced by the choice of technique you use in the next stage.
- But some things are done in most cases – example identifying missing values and treating them, identifying outlier values (unusual values) and treating them, transforming variables, creating binary variables if required etc,
- This is the stage where you will partition the data as well. i.e create training data (to do modelling) and validation (to do validation).

**11. How will you treat missing values?**

The first step is to identify variables with missing values. Assess the extent of missing values. Is there a pattern in missing values? If yes, try and identify the pattern. It may lead to interesting insights.

If no pattern, then we can either ignore missing values (SAS will not use any observation with missing data) or impute the missing values.

Simple imputation – substitute with mean or median values

OR

Case wise imputation –for example, if we have missing values in the income field.

**12. How will you treat outlier values?**

- You can identify outliers using graphical analysis and univariate analysis. If there are only a few outliers, you can assess them individually. If there are many, you may want to substitute the outlier values with the 1stpercentile or the 99th percentile values.
- If there is a lot of data, you may decide to ignore records with outliers.
- Not all extreme values are outliers. Not all outliers are extreme values.

**13. What are the data validation methods used by data analyst?**

Usually, methods used by data analyst for data validation are

- Data screening
- Data verification

**14. According to you, which are the qualities that a data analyst must possess to be successful at this position?**

Analytical and problem solving skills are crucial to be successful at this position. Also, one needs to be skilled at organizing and formatting data, so that information is available in an easy -to-read manner. Technical proficiency is important too, especially if your organization depends largely on the software and data processing tools.

**15. Describe the most difficult database problem you have faced? Why was it so difficult than other analytical problems you solved?**

My toughest challenge was to make prediction sales during the recession period and estimate the losses the organization may suffer in the coming quarter. It was difficult since I had to interpret information and forecast future trends. Till then I had analyzed the information that I already had and concluded on what had already happened. I had to evaluate the impact of receding economic conditions on different income groups and make an inference on the purchasing capacity of that group. I used different statistical methods and economic parameters to make this conclusion. (This is a personal interpretative question that must be answered as per your experience. Introduce the problem and highlight the exact nature of the problems faced. Give details of how you perceived the problem, analyzed the information, applied principles of analysis, managed information and found solutions to the objectives.)

**16. What procedure do you follow to analyze the given data?**

The procedure I adopt to solve analytical problems depends upon the objective of the analysis. At first, I organize information by making categories. Secondly, I find relationship between two or more categories such that any changes in one category affects the results of the other. Further, the procedure includes data cleaning, defining structure of samples, determining quality verification measures, computing statistics and making reports on the analysis.

**17. What do you understand from the term data cleaning?**

Data cleaning refers to the task of inspecting and cleaning data. From the given data, it is important to sort out information that is valuable for analysis. Also, one needs to eliminate information that is incorrect, unnecessary or repetitive. However, the entire database should be retrievable. Data cleaning does not impose deleting information completely from the database.

**18. Besides analyzing, have you ever participated in database designing and database development tasks?**

I have not directly participated in data designing and data development activities since we had a dedicated team to perform those functions. However, I have given valuable inputs to those teams during the discussions and helped them in research and data formatting. (Customize this answer as per your experience)

**19. I have provided you here some data regarding the sales figures of last 5 quarters. It shows the expenses incurred as a part of the sales activities? How would you analyze this data?**

(The answer to this question depends upon the nature of the problem that is given to you to solve. However, here are some steps that you should follow while solving the problem. Read the problem and the data carefully. Ask for an explanation for the terms that you could not understand. Extract the unusual and interesting trends that need attention. Apply the various statistical analysis methods and principles. Make a presentation or give a verbal account of your analysis as demanded.

**20. Which technical tools do you use often for analysis and presentation purpose?**

The tools to be used depend upon the nature of analysis being done. I am comfortable using the QDA (Qualitative Data Analysis) Miner, KNIME (Konstanz Information Miner), Root, PAW, etc. Also, most of my work is done using MS Excel features of shewhart control charts, stratification, histogram, scatter diagram, correlation and covariance, etc. I am keen on learning new tools as may be necessary to work at this organization.

**21. What are the tools used in Big Data?**

Tools used in Big Data includes

- Hadoop
- Hive
- Pig
- Flume
- Mahout
- Sqoop

**22. What is KPI, design of experiments and 80/20 rule?**

KPI: It stands for Key Performance Indicator, it is a metric that consists of any combination of spreadsheets, reports or charts about business process

Design of experiments: It is the initial process used to split your data, sample and set up of a data for statistical analysis

80/20 rules: It means that 80 percent of your income comes from 20 percent of your clients

**23. What is Map Reduce?**

Map-reduce is a framework to process large data sets, splitting them into subsets, processing each subset on a different server and then blending results obtained on each.

**24. What is Clustering? What are the properties for clustering algorithms?**

Clustering is a classification method that is applied to data. Clustering algorithm divides a data set into natural groups or clusters.

Properties for clustering algorithm are

- Hierarchical or flat
- Iterative
- Hard and soft
- Disjunctive

**25. What are some of the statistical methods that are useful for data-analyst?**

Statistical methods that are useful for data scientist are

- Bayesian method
- Markov process
- Spatial and cluster processes
- Rank statistics, percentile, outliers detection
- Imputation techniques, etc.
- Simplex algorithm
- Mathematical optimization

**26. What is time series analysis?**

Time series analysis can be done in two domains, frequency domain and the time domain. In Time series analysis the output of a particular process can be forecast by analyzing the previous data by the help of various methods like exponential smoothening, log-linear regression method, etc.

**27. What is correlogram analysis?**

A correlogram analysis is the common form of spatial analysis in geography. It consists of a series of estimated autocorrelation coefficients calculated for a different spatial relationship. It can be used to construct a correlogram for distance-based data, when the raw data is expressed as distance rather than values at individual points.

**28. What is a hash table?**

In computing, a hash table is a map of keys to values. It is a data structure used to implement an associative array. It uses a hash function to compute an index into an array of slots, from which desired value can be fetched.

**29. What are hash table collisions? How is it avoided?**

A hash table collision happens when two different keys hash to the same value. Two data cannot be stored in the same slot in array.

To avoid hash table collision there are many techniques, here we list out two

**Separate Chaining:**It uses the data structure to store multiple items that hash to the same slot.**Open addressing:**It searches for other slots using a second function and store item in first empty slot that is found

**30. Which imputation method is more favorable?**

Although single imputation is widely used, it does not reflect the uncertainty created by missing data at random. So, multiple imputation is more favorable then single imputation in case of data missing at random.

**31. What is n-gram?**

**N-gram:**

An n-gram is a contiguous sequence of n items from a given sequence of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).

**32. What is the criteria for a good data model?**

Criteria for a good data model includes

- It can be easily consumed
- Large data changes in a good model should be scalable
- It should provide predictable performance
- A good model can adapt to changes in requirements

**33. What is imputation? List out different types of imputation techniques?**

During imputation we replace missing data with substituted values. The types of imputation techniques involve are

- Single Imputation
**Hot-deck imputation:**A missing value is imputed from a randomly selected similar record by the help of punch card**Cold deck imputation:**It works same as hot deck imputation, but it is more advanced and selects donors from another datasets**Mean imputation:**It involves replacing missing value with the mean of that variable for all other cases**Regression imputation:**It involves replacing missing value with the predicted values of a variable based on other variables**Stochastic regression:**It is same as regression imputation, but it adds the average regression variance to regression imputation- Multiple Imputation
- Unlike single imputation, multiple imputation estimates the values multiple times

34. How in the past have you designed and made data accounts and reporting devices to lend a hand to business administrative in their choices?

35. What sort of an experience do you possess in this field? How would you utilize your experiences of past in this job?

36. What is involved in typical data analysis?

37. Describe two or three major trends in your field?

38. Did you choose this profession/field?

39. What tertiary qualifications have you attained that related to Data analyst?

40. What made you choose to apply to Data analyst?

41. How to measure job performance of your position: Data analyst?

**II. Interview tips for Financial data analyst**

You can use interview tips below to do interview preparation/process or see more at sidebar:

1. Searching the recruiters: history, products and services, competitors, structure….

2. Identity job description, job specs, job goals for Data analyst.

3. Ask by yourself how to prove your competencies face to job specs?

4. List technical interview questions for Data analyst.