Looker Interview Questions for freshers experienced :-
1. What exactly business intelligence according to you is?
Business intelligence is nothing but the combination of approaches that an organization uses for data analysis. The useful data can easily be generated from the bulk information that seems totally useless. The biggest benefit of generating the data is that information and decisions can easily be build up. Many organizations have attained a ton of success because of no other strategy than this. Business intelligence makes sure that one can impose a limit on the competition up to a good extent. There are several other issues that can also be eliminated by gathering the very useful information from the sources that seems highly unreliable.
2. What do you mean by the SSIP? Does it have any direct relation with SQL server?
SSIP stands for SQL server integration services. When it comes to performing some important tasks related to both ETL and migration of data, the same is widely adopted. Basically, it is very useful to enable the automatic maintenance of the SQL server and that is why it is considered having close relation with the SQL server. Although maintenance is not required on regular basis, this approach is highly beneficial.
3. Name the three categories in the data flow?
These are Transformations, Data Sources and Data Destinations. Users can also define other categories in case the need of same is realized. However, it is not possible that all the features work on that particular category.
4. Is it possible for the businesses to utilize the same resources for Business Intelligence or they need experts?
Well, it actually depends on the business. Most of the organizations have realized there is actually no need for this. The current workforce can easily be trained and the most desired outcomes can easily be expected. The fact is it doesn’t take a lot of time to train the employees on this domain. Because BI is a simple strategy, organizations can easily keep up the pace in every aspect.
5. Among the File System Deployment and the SQL server deployment, which one is better and why? Does the information exchange in both of them is possible?
Generally, the experts prefer SQL Server Deployment. The reason is it provides quick results and without compromising the safety. Yes, the same is possible.
6. Are you familiar with the cache modes available in Looker? How many of them are present in it?
There are three modes basically and all are equally powerful. These are Full cache mode, partially cache mode and No cache mode.
7. What exactly do you know about the Full cache mode in Looker?
Basically, this is one of the very powerful modes in which SSIS analyze the entire database. This is done prior to the prime activities. The process continues untill the end of the task. Data loading is one of the prime things in generally done in this approach.
8. Does log have relation with the packages?
Yes, they are very closely related to the package level. Even when there is a need for the configuration, the same is done only at the package level.
9. What are the noticeable differences you can find upon comparing DTS and SSIS?
DTS stands for Data transformation services while the SSIS stands for SQL Server Integration services.
- SSIS can handle a lot of errors irrespective of their complexity, size and source. In the other side the error handling capacity of DTS is limited
- There is actually not Business Intelligence functionality in the DTS while SSIS allow fully Business Intelligence Integration
- SSIS comes with an excellent development wizard. The same is absent in case of DTS
- When it comes to transformation, DTS cannot compete SSIS
- SSIS support .Net scripting while the DTS support X scripting
10. What do you mean by the term drilling in data analysis?
Well, it is basically an approach that is used for exploring the details of the data that seems useful. It can also be considered to eliminate all the issues such as authenticity and copyright.
11. What exactly do you know about the execution of SSIS?
There are multiple features for logging and they always make sure of log entries. This is generally taken into consideration when the run-time error declares its presence. Although it is not possible to enable this by default, but it can simply be used for writing messages that are totally customized. There is a very large set of log providers that are fully supported by the Integration services without bringing and problem related to compatibility. It is also possible to create the log providers manually. All log entries can be written into the text files very simply and without any third-party help.
12. What is pivoting?
Data can easily be switched from row to column and vice versa. The switching categories related to this are considered as pivoting. Pivoting make sure that no information is left on either row or on column when the same is exchanged by the user.
13. Compare No Cache Mode with Partial Cache Mode?
Upon adding the new rows, the SSIS start analyzing the database. The rows are only considered or allowed to enter only if they match with the currently existing data and sometime it creates issues when the rows comes instantly one after one. On the other side the No Cache Mode is a situation when the rows are not generally cached. Users can customize this mode and can allow the rows to be cached. However, this is one after one and thus consumes a lot of time.
14. What exactly do you know about the control flow?
All the containers as well as the tasks that are executed when the package runs are considered as control flow. Basically the prime purpose of them is to define the flow and control everything to provide best outcomes. There are also certain conditions for running a task. The same is handled by the control flow activities. It is also possible to run several tasks again and again. This always makes sure of time saving and the things can easily be managed in the right manner.
15. What do you mean by the term OLAP?
It is basically a strategy that is used for arranging the multidimensional data. Although the prime goal is analyzing of data, the applications can also be manipulated in case the need of same is realized. It stands for On-Line Analytical Processing.
16. In an analytics project, what are the steps which are important at every stage?
- Exploration of data
- Defining problems and the solutions for the same
- Tracking and Implementation of data
- Data Modeling
- Data validation
- Data Preparation
17. What exactly do you understand by the deployment of packages which are related with the SSIS?
For this, there is a file tagged as Manifest file. Actually, it needs to be run with the operation and the same always make sure of authenticated or reliable information for the containers and the without the violation of any policy. Users are free to deploy the same into the SQL server or in the File System depend on the needs and allocation.
18. Can you name the components of SQL Server Integration Service which is considered for hoc queries?
For hoc queries, the best available component is OLAP engine.
19. What are the control flow elements that are present in the SQL Server Integration Services?
These are
- Functionality related tasks which are responsible for providing proper functionality to the process
- Containers which are responsible to offer structures in the different packages
- Constraints that are considered for connecting the containers, executables in a defined sequence
- All these elements are not always necessary to be deployed in the same tasks. Also, they can be customized upto a good extent.
20. Can you name a few tools that you can deploy for Data Analysis?
The most commonly used tools are RapidMiner, Node XL, Wolfran Aplha, KNIME, SOLVER, Tableau, as well as Fusion Tables by Google.
21. Name the methods that are helpful against multi-source problems?
Identification of records that are similar ad second is the restructuring of schemas.
22. In data analysis, what you will call the process that places the data in the columns and in the rows?
This is generally called as the process of slicing. Slicing always makes sure that the data is at its defined position or location and no errors could be there due to this.
23. According to you, what are the prime qualities that any expert data analyst must have?
The very first thing is the right skills with right ability to collect, organize and disseminate big data and without comprising with the accuracy. The second big thing should be robust knowledge of course. Technical knowledge in the database domain is also required at several stages. In addition to this, a good data analyst must have leadership quality and patience too. Patience is required because gathering useful information from a useless or unstructured data is not an easy job. Analyzing the datasets which are very large in size need time to provide best outcomes in few cases.
24. Which container in a package is allowed for logging of information to a package log?
Every container or the task is allowed to do this. However, they need to be assigned during the initial stage of the operation for this.
25. Name a few approaches that you will consider for the data cleaning?
Any general method can be applied for this. However, the first thing to consider is the size of the data. If it is too large, it should be divided into the small components. Analyzing the summary statistics is another approach that can be deployed. Creating utility functions is also very useful and reliable.
26. What do you understand by the term Logistic regression?
It is basically an approach that is considered for proper verification of a dataset that contains independent variables. The verification level is based on how well the final outcome depends on these variables. It is not always easy to change them once defined.
27. How well can you define data flow?
It is basically a task that is executed with the help of an SSIS package and is responsible for data- transformation. The source and the destination are always well defined and the users can always keep up the pace with the extensions and modifications. This is because the same is slowed up to a very good extent and users are always free to get the desired information regarding this from the support sections.
28. What are the basic issues in the data that can create a lot of trouble for the data analyst?
One of the biggest trouble creators is the duplicate entries. Although this can be eliminated, there is no full accuracy possible. This is because the same information is generally available in a different format or in other sentences. The common misspelling is another major trouble creator. Also, the varying value can create a ton of issues. Moreover, values that are illegal, missing and cannot be identified can enhance the chances of various errors and the same affect the quality up to a great extent.
29. What are the two common methods that can be deployed for data validation?
These are Data verification and Data screening. Both these methods are identical but have different applications.
30. What do you mean by the term data cleansing?
It is nothing but the other name of data cleaning process. Basically, there are many approaches that are considered for eliminating the inconsistencies and errors from the datasets. Combination of all these approaches is considered as data cleansing. Basically, all the approaches or methods have a similar target and i.e. to boost the quality of data.