**Deep Learning Interview Questions for freshers experienced :-**

**1. What is Deep Learning?**

Deep learning is one part of a broader group of machine learning techniques based on learning data analytics designs, as exposed through task-specific algorithms. Deep Learning can be supervised us a semi-supervised or unsupervised.

**2. Which data visualization libraries do you use and why they are useful?**

It is valuable to determine your views value on the data value properly visualization and your individual preferences when one comes to tools. Popular methods add R’s ggplot, Python’s seaborn including matplotlib value, and media such as Plot.ly and Tableau.

**3. Where do you regularly source data-sets?**

This type of questions remains any real tie-breakers. If someone exists going into an interview, he/she need to remember this drill of any related question. That completely explains your interest in Machine Learning.

**4. What is the cost function?**

A cost function is a strength of the efficiency of the neural network data-set value with respect to given sample value and expected output data-set. It is a single value of data-set-function, non-vector as it gives the appearance of the neural network as a whole. MSE=1nΣi=0n(Y^i–Yi)^2

**5. What are the benefits of mini-batch gradient descent?**

- This is more efficient of compared tools to stochastic gradient reduction.
- The generalization data value by determining the flat minima.
- The Mini-batches provides help to approximate the gradient of this entire data-set advantage which helps us to neglect local minima.

**6. What is mean by gradient descent?**

Gradient descent defined as an essential optimization algorithm value point, which is managed to get the value of parameters that reduces the cost function. It is an iterative algorithm data value function which is moves towards the direction of steepest data value function relationship as described by the form of the gradient.

Θ: =Θ–αd∂ΘJ(Θ)

**7. What is meant by a backpropagation?**

- It ‘s Forward to the propagation of data-set value function in order to display the output data value function.
- Then using objective value also output value error derivative package is computed including respect to output activation.
- Then we after propagate to computing derivative of the error with regard to output activation value function and the previous and continue data value function this for all the hidden layers.
- Using previously calculated the data-set value and its derivatives the for output including any hidden stories we estimate error derivatives including respect to weights.

**8. What is means by convex hull?**

The convex hull is represents to the outer boundaries of the two-level group of the data point. Once is the convex hull has to been created the data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value the greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.

**9. Do you have experience including Spark about big data tools for machine learning?**

The Spark and big data mean most favorite demand now, able to the handle high-level data-sets value and including speed. Be true if you don’t should experience including those tools needed, but more take a look into assignment descriptions also understand methods pop.

**10. How will do handle the missing data?**

One can find out the missing data and then a data-set value either drop thorugh those rows value or columns value or decide value to restore them with another value. In python library using towards the Pandas, there are two thinging useful functions helpful, IsNull() and drop() the value function.

**11. What is means by auto-encoder?**

An Auto-encoder does an autonomous Machine learning algorithm data that uses backpropagation system, where that target large values are data-set to be similar to the inputs provided data-set value. Internally, it converts a deep layer that describes a code used to represent specific input.

**12. Explain about from Machine Learning in industry.**

Robots are replacing individuals in various areas. It is because robots are added so that all can perform this task based on the data-set value function they find from sensors. They see from this data also behaves intelligently.

**13. What are the difference Algorithm techniques in Machine Learning?**

- Reinforcement Learning
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Transduction
- Learning to Learn

**14. Difference between supervised and unsupervised machine learning?**

Supervised learning is a method anywhere that requires instruction defined data While Unsupervised learning it doesn’t need data labeling.

**15. What is the advantage of Naive Bayes?**

The classifier preference converge active than discriminative types

It cannot learn that exchanges between characteristics

**16. What are the function using Supervised Learning?**

- Classifications
- Speech recognition
- Regression
- Predict time series
- Annotate strings

**17. What are the functions using Unsupervised Learning?**

- To Find that the data of the cluster of the data
- To Find the low-dimensional representations value of the data
- To Find determine interesting with directions in data
- To Find the Magnetic coordinates including correlations
- To Find novel observations

**18. How do you understanding Machine Learning Concepts?**

Machine learning is the use of artificial intelligence that provides operations that ability to automatically detect further improve from occurrence without doing explicitly entered. Machine learning centers on the evolution of network programs that can access data and utilize it to learn for themselves.

**19. What are the roles of activation function?**

The activation function means related to data enter non-linearity within the neural network helping it to learn more system function. Without which that neural network data value would be simply able to get a linear function which is a direct organization of its input data.

**20. Definition of Boltzmann Machine?**

Boltzmann Machine is used to optimize the resolution of a problem. The work of the Boltzmann machine is essential to optimize data-set value that weights and the quantity for data Value.

It uses a recurrent structure data value.

If we apply affected annealing on discrete Hopfield network, when it would display Boltzmann Machine.

Get Deep Learning 100% Practical Training

**21. What is Overfitting in Machine Learning?**

Overfitting in Machine Learning is described as during a statistical data model represents random value error or noise preferably of any underlying relationship or when a pattern is extremely complex.

**22. How can you avoid overfitting?**

- Lots of data
- Cross-validation

**23. What are the conditions when Overfitting happens?**

One of the important design and chance of overfitting is because the models used as training that model is the same as that criterion used to assess the efficacy of a model.

**24. What are the advantages of decision trees?**

- The Decision trees are easy to interpret
- Nonparametric
- There are comparatively few parameters to tune

**25. What are the three stages to build the hypotheses or model in machine learning?**

- Model building
- Model testing
- Applying the model

**26. What are parametric models and Non-Parametric models?**

Parametric models remain these with a limited number from parameters also to predict new data, you only need to understand that parameters from the model.

Non Parametric designs are those with an unlimited number from parameters, allowing to and flexibility and to predict new data, you want to understand the parameters of this model also the state from the data that has been observed.

**27. What are some different cases uses of machine learning algorithms can be used?**

- Fraud Detection
- Face detection
- Natural language processing
- Market Segmentation
- Text Categorization
- Bioinformatics

**28. What are the popular algorithms for Machine Learning?**

- Decision Trees
- Probabilistic networks
- Nearest Neighbor
- Support vector machines
- Neural Networks

**29. Define univariate multivariate and bivariate analysis?**

if an analysis involves only one variable it is called as a univariate analysis for eg: Pie chart, Histogram etc. If a analysis involves 2 variables it is called as bivariate analysis for example to see how age vs population is varying we can plot a scatter plot. A multivariate analysis involves more than two variables, for example in regression analysis we see the effect of variables on the response variable

**30. How does missing value imputation lead to selection bias?**

Case treatment- Deleting the entire row for one missing value in a specific column, Implutaion by mean: distribution might get biased for instance std dev, regression, correlation.

**31. What is bootstrap sampling?**

create resampled data from empirical data known as bootstrap replicates.

**32. What is permutation sampling?**

Also known as randomization tests, the process of testing a statistic based on reshuffling the data labels to see the difference between two samples.

**33. What is total sum of squares?**

summation of squares of difference of individual points from the population mean.

**34. What is sum of squares within?**

summation of squares of difference of individual points from the group mean.

**35. What is sum of squares between?**

summation of squares of difference of individual group means from the population mean for each data point.

**36. What is p value?**

p value is the worst case probability of a statistic under the assumption of null hypothesis being true.

**37. What is R^2 value?**

It’s measures the goodness of fit for a linear regression model.

**38. What does it mean to have a high R^2 value?**

the statistic measures variance percentage in dependent variable that can be explained by the independent variables together.

**40. What are residuals in a regression model?**

Residuals in a regression model is the difference between the actual observation and its distance from the predicted value from a regression model.

**41. What are fitted values, calculate fitted value for Y=7X+8, when X =5?**

Response of the model when predictors values are used in the model, Ans=42.

**42. What pattern should residual vs fitted plots show in a regression analysis?**

No pattern, if the plot shows a pattern regression coefficients cannot be trusted.

**43. What is overfitting and underfitting?**

overfitting occurs when a model is excessively complex and cannot generalize well, a overfitted model has a poor predictive performance. Underfitting of a model occurs when the model is not able to capture any trends from the data.

**44. Define precision and recall?**

Recall = True Positives/(True Positives + False Negatives), Precision = True Positives/(True Positives + False Positive).

**45. What is type 1 and type 2 errors?**

False positives are termed as Type 1 error, False negative are termed as Type 2 error.

**46. What is ensemble learning?**

The art of combining multiple learning algorithms and achieve a model with a higher predictive power, for example bagging, boosting.

**47. What is the difference between supervised and unsupervised machine learning algorithms?**

In supervised learning we use the dataset which is labelled and try and learn from that data, unsupervised modeling involves data which is not labelled.

**48. What is named entity recognition?**

It is identifying, understanding textual data to answer certain question like “who, when,where,What etc.”

**49. What is tf-idf?**

It is the measure if a weight of a term in text data used majorly in text mining. It signifies how important a word is to a document.

tf -> term frequency – (Count of text appearing in the data)

idf -> inverse document frequency

tfidf -> tf * idf

**50. What is the difference between regression and deep neural networks, is regression better than neural networks?**

In some applications neural networks would fit better than regression it usually happens when there are non linearity involved, on the contrary a linear regression model would have less parameters to estimate than a neural network for the same set of input variables. thus for optimization neural network would need a more data in order to get better generalization and nonlinear association.

**51. How are node values calculated in a feed forward neural network?**

The weights are multiplied with node/input values and are summed up to generate the next successive node

**52. Name two activation functions used in deep neural networks?**

Sigmoid, softmax, relu, leaky relu, tanh.

**53. What is the use of activation functions in neural networks?**

Activation functions are used to explain the non linearity present in the data.

**54. How are the weights calculated which determine interactions in neural networks?**

The training model sets weights to optimize predictive accuracy.

**55. which layer in a deep learning model would capture a more complex or higher order interaction?**

The last layer.

**56. What is gradient descent?**

It comprises of minimizing a loss function to find the optimal weights for a neural network.

**57. Imagine a loss function vs weights plot depicting a gradient descent. At What point of the curve would we achieve optimal weights?**

local minima.

**58. How does slope of tangent to the curve of loss function vs weigts help us in getting optimal weights for a neural network**

Slope of a curve at any point will give us the direction component which would help us decide which direction we would want to go i.e What weights to consider to achieve a less magnitude for loss function.

**59. What is learning rate in gradient descent?**

A value depicting how slowly we should move towards achieving optimal weights, weights are changedby the subtracting the value obtained from the product of learning rate and slope.

**60. If in backward propagation you have gone through 9 iterations of calculating slopes and updated the weights simultaneously, how many times you must have done forward propagation?**

9

**61. How does ReLU activation function works? Define its value for -5 and +7**

For all x>=0, the output is x, for all x<0 the output is 0, for -5 the output is 0 and +7 returns +7 when ReLu activation function is used.

**62. What is a batch in deep neural networks?**

It is common that we calculate slopes only on a subset of the data known as batch for computational efficiencies.

**63. What is an epoch in Deep neural networks?**

when an entire dataset is done with both forward and backward propagation after which the weights are updated it is said to have passed 1 epoch.

**64. Imagine you have 2000 training samples and batch size is set to 200 how many iterations will it take to complete 1 epoch?**

10

**65. What is stocastic gradient descent?**

when slopes are calculated on one batch at a time it is refered as stochastic gradient descent.

**66. What is data normalization?**

Data normalization is a technique used to scale all values in a dataset to fit with a specific range. It is an important for achieving good converge for deep learning models.

**67. How can we normalize the data? State a method used for the same?**

(Feature mean – observation)/standard deviation.

**68. Explain dying neuron problem?**

Occurs when a neuron takes a value <0 for all rows of the data thus with ReLU activation function a will produce an output of 0 and thus the slope will be 0.

**69. Explain vanishing gradients?**

Occurs when many layers have small slopes for example we use a Tanh activation function in a deep network.

**70. What is model capacity?**

It describes how complex a model can get also in deep neural networks it is proportional to the number of hidden layers included.

**71. What is regularization?**

The process of including model by adjusting data tuning parameters.

**72. What are hyper parameters in deep neural networks?**

Hyper parameters are features which describe the network structure in a neural network. Some hyper parameters also decide how the model should be trained for achieving optimum results.

**73. What is dropout in deep neural networks?**

To avoid over fitting a dropout regularization technique is used which increases the generalizing power

**74. Explain exploding gradient descent?**

Huge error gradients when are added together during training which results into high value updates to the weights.

**75. What are auto encodes?**

A neural netwok architecture in which back propogation occurs and targets are set equal to inputs.

**76. What is homoscadasticity and heteroscadasticity?**

when there is an equal distribution of errors it is termed as homoscadasticity, on the contrary when there is an unequal distribution of errors it is termed as heteroscadasticity.

**77. Difference between adjusted R^2 and R^2**

R^2 accounts for the variation of dependent variable explained by the independent variables, adjusted R^2 value just takes into account the variation explained by all the significant variables.

**78. In case residual vs fitted plots is showing a pattern and is not distributed evenly or has some outliers how should it be handled?**

In such a case variable transformation should be tried, for example log, x^2,x^3 etc.

**79. Difference between collinearity and correlation?**

correlation is the measure of strength of linear relationship between two variables, whereas if in a liner regression one of the predictors is derived or is associated with another predictor they both are said to be collinear.

**80. What are the different layers of Autoencoders? Explain briefly.**

An autoencoder contains three layers:

**Encoder**

The encoder is used to compress the input into a latent space representation. It encodes the input images as a compressed representation in a reduced dimension. The compressed images are the distorted version of the original image.

**Code**

The code layer is used to represent the compressed input which is fed to the decoder.

**Decoder**

The decoder layer decodes the encoded image back to its original dimension. The decoded image is a reduced reconstruction of the original image. It is automatically reconstructed from the latent space representation.