1. Mention what is the responsibility of a Data analyst?
Responsibility of a Data analyst include,2. What is required to become a data analyst?
To become a data analyst,3. Mention what are the various steps in an analytics project?
Various steps in an analytics project include4. What is data cleansing?
Data cleaning also called as data cleansing, deals with recognizing and removing errors and irregularity from data in order to enhance the classification of data.5. List out some of the best practices for data cleaning?
6. What is logistic regression?
Logistic regression is a statistical method for examining a dataset in which there are one or more independent variables that defines an outcome.7. List of some best tools that can be useful for data-analysis?
8. What is the difference between data mining and data profiling?
The difference between data mining and data profiling is thatData profiling: It targets on the instance analysis of individual attributes. It gives information on various attributes like value range, discrete value and their frequency, occurrence of null values, data type, length, etc.
Data mining: Complete attention is on cluster analysis, detection of different records, possession, order discovery, relation holding between some attributes, etc.
9. List out some common problems faced by data analyst?
Some of the common problems faced by data analyst are:10. Mention the name of the framework developed by Apache for processing large data set for an application in a distributed computing environment?
Hadoop and MapReduce is the programming framework developed by Apache for processing large data set for an application in a distributed computing environment.11. Mention what are the missing patterns that are generally observed?
The missing patterns that are generally observed are12. What is time series analysis?
Time series analysis can be done in two domains, frequency domain and the time domain. In Time series analysis the output of a particular process can be forecast by analyzing the previous data by the help of various methods like exponential smoothening, log-linear regression method, etc.13. Explain what is correlogram analysis?
A correlogram analysis is the common form of spatial analysis in geography. It consists of a series of estimated autocorrelation coefficients calculated for a different spatial relationship. It can be used to construct a correlogram for distance-based data, when the raw data is expressed as distance rather than values at individual points.14. What is a hash table?
In computing, a hash table is a map of keys to values. It is a data structure used to implement an associative array. It uses a hash function to compute an index into an array of slots, from which desired value can be fetched.15. What are hash table collisions? How is it avoided?
A hash table collision happens when two different keys hash to the same value. Two data cannot be stored in the same slot in array.To avoid hash table collision there are many techniques, here we list out two
It uses the data structure to store multiple items that hash to the same slot.
It explore for other slots using a second function and store item in first empty slot that is found
16. What is imputation? List out different types of imputation techniques?
During imputation we replace missing data with substituted values. The types of imputation techniques involve are Single Imputation17. Explain what is n-gram?
N-gram: An n-gram is a contiguous order of n items from a given order of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).18. Explain what is the criteria for a good data model?
Criteria for a good data model includes19. Which imputation method is more favourable?
However single imputation is widely used, it does not reflect the variability created by missing data at random. So, multiple imputation is more favourable then single imputation in case of data missing at random.20. Explain what is Clustering? What are the properties for clustering algorithms?
Clustering is a classification method that is applied to data. Clustering algorithm divides a data set into natural groups or clusters.
Properties for clustering algorithm are: