300+ TOP Deep Learning Interview Questions and Answers

Deep Learning Interview Questions for freshers experienced :-

1. What is Deep Learning?
Deep learning is one part of a broader group of machine learning techniques based on learning data analytics designs, as exposed through task-specific algorithms. Deep Learning can be supervised us a semi-supervised or unsupervised.

2. Which data visualization libraries do you use and why they are useful?
It is valuable to determine your views value on the data value properly visualization and your individual preferences when one comes to tools. Popular methods add R’s ggplot, Python’s seaborn including matplotlib value, and media such as Plot.ly and Tableau.

3. Where do you regularly source data-sets?
This type of questions remains any real tie-breakers. If someone exists going into an interview, he/she need to remember this drill of any related question. That completely explains your interest in Machine Learning.

4. What is the cost function?
A cost function is a strength of the efficiency of the neural network data-set value with respect to given sample value and expected output data-set. It is a single value of data-set-function, non-vector as it gives the appearance of the neural network as a whole. MSE=1nΣi=0n(Y^i–Yi)^2

5. What are the benefits of mini-batch gradient descent?

  • This is more efficient of compared tools to stochastic gradient reduction.
  • The generalization data value by determining the flat minima.
  • The Mini-batches provides help to approximate the gradient of this entire data-set advantage which helps us to neglect local minima.

6. What is mean by gradient descent?
Gradient descent defined as an essential optimization algorithm value point, which is managed to get the value of parameters that reduces the cost function. It is an iterative algorithm data value function which is moves towards the direction of steepest data value function relationship as described by the form of the gradient.

Θ: =Θ–αd∂ΘJ(Θ)

7. What is meant by a backpropagation?

  • It ‘s Forward to the propagation of data-set value function in order to display the output data value function.
  • Then using objective value also output value error derivative package is computed including respect to output activation.
  • Then we after propagate to computing derivative of the error with regard to output activation value function and the previous and continue data value function this for all the hidden layers.
  • Using previously calculated the data-set value and its derivatives the for output including any hidden stories we estimate error derivatives including respect to weights.

8. What is means by convex hull?
The convex hull is represents to the outer boundaries of the two-level group of the data point. Once is the convex hull has to been created the data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value the greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.

9. Do you have experience including Spark about big data tools for machine learning?
The Spark and big data mean most favorite demand now, able to the handle high-level data-sets value and including speed. Be true if you don’t should experience including those tools needed, but more take a look into assignment descriptions also understand methods pop.

10. How will do handle the missing data?
One can find out the missing data and then a data-set value either drop thorugh those rows value or columns value or decide value to restore them with another value. In python library using towards the Pandas, there are two thinging useful functions helpful, IsNull() and drop() the value function.

Deep Learning Interview Questions
Deep Learning Interview Questions

11. What is means by auto-encoder?
An Auto-encoder does an autonomous Machine learning algorithm data that uses backpropagation system, where that target large values are data-set to be similar to the inputs provided data-set value. Internally, it converts a deep layer that describes a code used to represent specific input.

12. Explain about from Machine Learning in industry.
Robots are replacing individuals in various areas. It is because robots are added so that all can perform this task based on the data-set value function they find from sensors. They see from this data also behaves intelligently.

13. What are the difference Algorithm techniques in Machine Learning?

  1. Reinforcement Learning
  2. Supervised Learning
  3. Unsupervised Learning
  4. Semi-supervised Learning
  5. Transduction
  6. Learning to Learn

14. Difference between supervised and unsupervised machine learning?
Supervised learning is a method anywhere that requires instruction defined data While Unsupervised learning it doesn’t need data labeling.

15. What is the advantage of Naive Bayes?
The classifier preference converge active than discriminative types
It cannot learn that exchanges between characteristics

16. What are the function using Supervised Learning?

  1. Classifications
  2. Speech recognition
  3. Regression
  4. Predict time series
  5. Annotate strings

17. What are the functions using Unsupervised Learning?

  • To Find that the data of the cluster of the data
  • To Find the low-dimensional representations value of the data
  • To Find determine interesting with directions in data
  • To Find the Magnetic coordinates including correlations
  • To Find novel observations

18. How do you understanding Machine Learning Concepts?
Machine learning is the use of artificial intelligence that provides operations that ability to automatically detect further improve from occurrence without doing explicitly entered. Machine learning centers on the evolution of network programs that can access data and utilize it to learn for themselves.

19. What are the roles of activation function?
The activation function means related to data enter non-linearity within the neural network helping it to learn more system function. Without which that neural network data value would be simply able to get a linear function which is a direct organization of its input data.

20. Definition of Boltzmann Machine?
Boltzmann Machine is used to optimize the resolution of a problem. The work of the Boltzmann machine is essential to optimize data-set value that weights and the quantity for data Value.

It uses a recurrent structure data value.
If we apply affected annealing on discrete Hopfield network, when it would display Boltzmann Machine.
Get Deep Learning 100% Practical Training

21. What is Overfitting in Machine Learning?
Overfitting in Machine Learning is described as during a statistical data model represents random value error or noise preferably of any underlying relationship or when a pattern is extremely complex.

22. How can you avoid overfitting?

  • Lots of data
  • Cross-validation

23. What are the conditions when Overfitting happens?
One of the important design and chance of overfitting is because the models used as training that model is the same as that criterion used to assess the efficacy of a model.

24. What are the advantages of decision trees?

  • The Decision trees are easy to interpret
  • Nonparametric
  • There are comparatively few parameters to tune

25. What are the three stages to build the hypotheses or model in machine learning?

  • Model building
  • Model testing
  • Applying the model

26. What are parametric models and Non-Parametric models?

Parametric models remain these with a limited number from parameters also to predict new data, you only need to understand that parameters from the model.
Non Parametric designs are those with an unlimited number from parameters, allowing to and flexibility and to predict new data, you want to understand the parameters of this model also the state from the data that has been observed.

27. What are some different cases uses of machine learning algorithms can be used?

  • Fraud Detection
  • Face detection
  • Natural language processing
  • Market Segmentation
  • Text Categorization
  • Bioinformatics

28. What are the popular algorithms for Machine Learning?

  • Decision Trees
  • Probabilistic networks
  • Nearest Neighbor
  • Support vector machines
  • Neural Networks

29. Define univariate multivariate and bivariate analysis?
if an analysis involves only one variable it is called as a univariate analysis for eg: Pie chart, Histogram etc. If a analysis involves 2 variables it is called as bivariate analysis for example to see how age vs population is varying we can plot a scatter plot. A multivariate analysis involves more than two variables, for example in regression analysis we see the effect of variables on the response variable

30. How does missing value imputation lead to selection bias?
Case treatment- Deleting the entire row for one missing value in a specific column, Implutaion by mean: distribution might get biased for instance std dev, regression, correlation.

31. What is bootstrap sampling?
create resampled data from empirical data known as bootstrap replicates.

32. What is permutation sampling?
Also known as randomization tests, the process of testing a statistic based on reshuffling the data labels to see the difference between two samples.

33. What is total sum of squares?
summation of squares of difference of individual points from the population mean.

34. What is sum of squares within?
summation of squares of difference of individual points from the group mean.

35. What is sum of squares between?
summation of squares of difference of individual group means from the population mean for each data point.

36. What is p value?
p value is the worst case probability of a statistic under the assumption of null hypothesis being true.

37. What is R^2 value?
It’s measures the goodness of fit for a linear regression model.

38. What does it mean to have a high R^2 value?
the statistic measures variance percentage in dependent variable that can be explained by the independent variables together.

40. What are residuals in a regression model?
Residuals in a regression model is the difference between the actual observation and its distance from the predicted value from a regression model.

41. What are fitted values, calculate fitted value for Y=7X+8, when X =5?
Response of the model when predictors values are used in the model, Ans=42.

42. What pattern should residual vs fitted plots show in a regression analysis?
No pattern, if the plot shows a pattern regression coefficients cannot be trusted.

43. What is overfitting and underfitting?
overfitting occurs when a model is excessively complex and cannot generalize well, a overfitted model has a poor predictive performance. Underfitting of a model occurs when the model is not able to capture any trends from the data.

44. Define precision and recall?
Recall = True Positives/(True Positives + False Negatives), Precision = True Positives/(True Positives + False Positive).

45. What is type 1 and type 2 errors?
False positives are termed as Type 1 error, False negative are termed as Type 2 error.

46. What is ensemble learning?
The art of combining multiple learning algorithms and achieve a model with a higher predictive power, for example bagging, boosting.

47. What is the difference between supervised and unsupervised machine learning algorithms?
In supervised learning we use the dataset which is labelled and try and learn from that data, unsupervised modeling involves data which is not labelled.

48. What is named entity recognition?
It is identifying, understanding textual data to answer certain question like “who, when,where,What etc.”

49. What is tf-idf?
It is the measure if a weight of a term in text data used majorly in text mining. It signifies how important a word is to a document.

tf -> term frequency – (Count of text appearing in the data)

idf -> inverse document frequency

tfidf -> tf * idf

50. What is the difference between regression and deep neural networks, is regression better than neural networks?
In some applications neural networks would fit better than regression it usually happens when there are non linearity involved, on the contrary a linear regression model would have less parameters to estimate than a neural network for the same set of input variables. thus for optimization neural network would need a more data in order to get better generalization and nonlinear association.

51. How are node values calculated in a feed forward neural network?
The weights are multiplied with node/input values and are summed up to generate the next successive node

52. Name two activation functions used in deep neural networks?
Sigmoid, softmax, relu, leaky relu, tanh.

53. What is the use of activation functions in neural networks?
Activation functions are used to explain the non linearity present in the data.

54. How are the weights calculated which determine interactions in neural networks?
The training model sets weights to optimize predictive accuracy.

55. which layer in a deep learning model would capture a more complex or higher order interaction?
The last layer.

56. What is gradient descent?
It comprises of minimizing a loss function to find the optimal weights for a neural network.

57. Imagine a loss function vs weights plot depicting a gradient descent. At What point of the curve would we achieve optimal weights?
local minima.

58. How does slope of tangent to the curve of loss function vs weigts help us in getting optimal weights for a neural network
Slope of a curve at any point will give us the direction component which would help us decide which direction we would want to go i.e What weights to consider to achieve a less magnitude for loss function.

59. What is learning rate in gradient descent?
A value depicting how slowly we should move towards achieving optimal weights, weights are changedby the subtracting the value obtained from the product of learning rate and slope.

60. If in backward propagation you have gone through 9 iterations of calculating slopes and updated the weights simultaneously, how many times you must have done forward propagation?
9

61. How does ReLU activation function works? Define its value for -5 and +7
For all x>=0, the output is x, for all x<0 the output is 0, for -5 the output is 0 and +7 returns +7 when ReLu activation function is used.

62. What is a batch in deep neural networks?
It is common that we calculate slopes only on a subset of the data known as batch for computational efficiencies.

63. What is an epoch in Deep neural networks?
when an entire dataset is done with both forward and backward propagation after which the weights are updated it is said to have passed 1 epoch.

64. Imagine you have 2000 training samples and batch size is set to 200 how many iterations will it take to complete 1 epoch?
10

65. What is stocastic gradient descent?
when slopes are calculated on one batch at a time it is refered as stochastic gradient descent.

66. What is data normalization?
Data normalization is a technique used to scale all values in a dataset to fit with a specific range. It is an important for achieving good converge for deep learning models.

67. How can we normalize the data? State a method used for the same?
(Feature mean – observation)/standard deviation.

68. Explain dying neuron problem?
Occurs when a neuron takes a value <0 for all rows of the data thus with ReLU activation function a will produce an output of 0 and thus the slope will be 0.

69. Explain vanishing gradients?
Occurs when many layers have small slopes for example we use a Tanh activation function in a deep network.

70. What is model capacity?
It describes how complex a model can get also in deep neural networks it is proportional to the number of hidden layers included.

71. What is regularization?
The process of including model by adjusting data tuning parameters.

72. What are hyper parameters in deep neural networks?
Hyper parameters are features which describe the network structure in a neural network. Some hyper parameters also decide how the model should be trained for achieving optimum results.

73. What is dropout in deep neural networks?
To avoid over fitting a dropout regularization technique is used which increases the generalizing power

74. Explain exploding gradient descent?
Huge error gradients when are added together during training which results into high value updates to the weights.

75. What are auto encodes?
A neural netwok architecture in which back propogation occurs and targets are set equal to inputs.

76. What is homoscadasticity and heteroscadasticity?
when there is an equal distribution of errors it is termed as homoscadasticity, on the contrary when there is an unequal distribution of errors it is termed as heteroscadasticity.

77. Difference between adjusted R^2 and R^2
R^2 accounts for the variation of dependent variable explained by the independent variables, adjusted R^2 value just takes into account the variation explained by all the significant variables.

78. In case residual vs fitted plots is showing a pattern and is not distributed evenly or has some outliers how should it be handled?
In such a case variable transformation should be tried, for example log, x^2,x^3 etc.

79. Difference between collinearity and correlation?
correlation is the measure of strength of linear relationship between two variables, whereas if in a liner regression one of the predictors is derived or is associated with another predictor they both are said to be collinear.

80. What are the different layers of Autoencoders? Explain briefly.
An autoencoder contains three layers:

  • Encoder

The encoder is used to compress the input into a latent space representation. It encodes the input images as a compressed representation in a reduced dimension. The compressed images are the distorted version of the original image.

  • Code

The code layer is used to represent the compressed input which is fed to the decoder.

  • Decoder

The decoder layer decodes the encoded image back to its original dimension. The decoded image is a reduced reconstruction of the original image. It is automatically reconstructed from the latent space representation.

Deep Learning Questions and Answers Pdf Download

300+ [LATEST] Deep Learning Interview Questions and Answers

Q1. What Is An Auto-encoder?

An autoencoder is an autonomous Machine learning algorithm that uses backpropagation principle, where the target values are set to be equal to the inputs provided. Internally, it has a hidden layer that describes a code used to represent the input.

Some Key Facts about the autoencoder are as follows:-

  • It is an unsupervised ML algorithm similar to Principal Component Analysis
  • It minimizes the same objective function as Principal Component Analysis
  • It is a neural network
  • The neural network’s target output is its input

Q2. Weight Initialization In Neural Networks?

Weight initialization is a very important step. Bad weight initialization can prevent a network from learning. Good initialization can lead to quicker convergence and better overall error. Biases can be generally initialized to zero. The general rule for setting the weights is to be close to zero without being too small.

Q3. What Is A Model Capacity?

Ability to approximate any given function. The higher model capacity is the larger amount of information that can be stored in the network.

Q4. What Are The Benefits Of Mini-batch Gradient Descent?

  1. Computationally efficient compared to stochastic gradient descent.
  2. Improve generalization by finding flat minima.
  3. Improving convergence, by using mini-batches we approximating the gradient of the entire training set, which might help to avoid local minima.

Q5. What Are Hyperparameters, Provide Some Examples?

Hyperparameters as opposed to model parameters can’t be learn from the data, they are set before training phase.

Learning rate:

It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it can overshoot the minima). It’s considered to be the most important hyperparameter.

Number of epochs:

Epoch is defined as one forward pass and one backward pass of all training data.

Batch size:

The number of training examples in one forward/backward pass.

Q6. Explain The Following Three Variants Of Gradient Descent: Batch, Stochastic And Mini-batch?

Stochastic Gradient Descent:

Uses only single training example to calculate the gradient and update parameters.

Batch Gradient Descent:

Calculate the gradients for the whole dataset and perform just one update at each iteration.

Mini-batch Gradient Descent:

Mini-batch gradient is a variation of stochastic gradient descent where instead of single training example, mini-batch of samples is used. It’s one of the most popular optimization algorithms. 

Q7. What Is Data Normalization And Why Do We Need It?

Data normalization is very important preprocessing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation.

Q8. What Is An Autoencoder?

Autoencoder is artificial neural networks able to learn representation for a set of data (encoding), without any supervision. The network learns by copying its input to the output, typically internal representation has smaller dimensions than input vector so that they can learn efficient ways of representing data. Autoencoder consist of two parts, an encoder tries to fit the inputs to an internal representation and decoder converts internal state to the outputs.

Q9. What Is A Boltzmann Machine?

Boltzmann Machine is used to optimize the solution of a problem. The work of Boltzmann machine is basically to optimize the weights and the quantity for the given problem.

Some important points about Boltzmann Machine −

  • It uses recurrent structure.
  • It consists of stochastic neurons, which consist one of the two possible states, either 1 or @
  • The neurons in this are either in adaptive (free state) or clamped (frozen state).
  • If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.

Q10. What Is Weight Initialization In Neural Networks?

Weight initialization is one of the very important steps. A bad weight initialization can prevent a network from learning but good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be generally initialized to zero. The rule for setting the weights is to be close to zero without being too small.

Q11. What Is A Backpropagation?

Backpropagation is a training algorithm used for a multilayer neural networks. It moves the error information from the end of the network to all the weights inside the network and thus allows for efficient computation of the gradient.

The backpropagation algorithm can be divided into several steps:

  1. Forward propagation of training data through the network in order to generate output.
  2. Use target value and output value to compute error derivative with respect to output activations.
  3. Backpropagate to compute the derivative of the error with respect to output activations in the previous layer and continue for all hidden layers.
  4. Use the previously calculated derivatives for output and all hidden layers to calculate the error derivative with respect to weights.
  5. Update the weights.

Q12. Is It Ok To Connect From A Layer 4 Output Back To A Layer 2 Input?

Yes, this can be done considering that layer 4 output is from previous time step like in RNN. Also, we need to assume that previous input batch is sometimes- correlated with current batch.

Q13. What Is A Dropout?

Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of these models share weights. It’s a form of model averaging.

Q14. What Is The Role Of The Activation Function?

The goal of an activation function is to introduce nonlinearity into the neural network so that it can learn more complex function. Without it, the neural network would be only able to learn function which is a linear combination of its input data.

Q15. Why Are Deep Networks Better Than Shallow Ones?

Both shallow and deep networks are capable of approximating any function. For the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks are able to create deep representations, at every layer, the network learns a new, more abstract representation of the input.

Q16. Why Is Zero Initialization Not A Recommended Weight Initialization Technique?

As a result of setting weights in the network to zero, all the neurons at each layer are producing the same output and the same gradients during backpropagation.

The network can’t learn at all because there is no source of asymmetry between neurons. That is why we need to add randomness to weight initialization process.