Data Science And Big Data Anlytics Choose The Coorect Answer

 

QUESTION 1

The TFIDF (or TF-IDF) is a measure that considers both  ______________ and ________________.

A.commonness of a term,  the scarcity of the term 

B.uncommoness of a term,  the scarcity of the term 

C.length of a term,  the scarcity of the term  

D.uncommoness of a term,  the weakness of the term  

5 points  

QUESTION 2

The term __________ refers to a specific implementation of association rules mining that many companies use for a variety of purposes.

A.market research analysis

B.market prediction analysis

C.market competitive analysis

D.market basket analysis

5 points  

QUESTION 3

A distribution over a fixed vocabulary of words is formally defined as..

A.subject

B.topic

C.story

D.text line

5 points  

QUESTION 4

Your customer provided you with 3,000 unlabeled records and asked you to separate them into three groups. What is the correct analytical method to use?

A.K-means clustering

B.Naive Bayesian classification

C.Linear regression

D.Logistic regression 

5 points  

QUESTION 5

Time series analysis attempts to model the underlying structure of ________________taken over time.

A.observation

B.patterns

C.solution

D.facts

5 points  

QUESTION 6

Which of the following algorithm are not an example of ensemble learning algorithm?

A.Random Forest

B.Adaboost

C.Gradient Boosting

D.Decision Trees  

5 points  

QUESTION 7

What is the difference between supervised learning and unsupervised learning?

A.Supervised learning algorithms work on data which are labelled. On the other hand, unsupervised learning algorithms work on unlabeled data.

B.Supervised learning algorithms work on data which are unlabelled. On the other hand, unsupervised learning algorithms work on labeled data.

C.Supervised learning algorithms work on raw data.  On the other had, unsupervised learning algorithms work on process data.

D.None of these

5 points  

QUESTION 8

  1. The ____________ is the most iterative one and the one that teams tend to underestimate the amount of effort involved.
  2. Discovery Phase
  3. Model Building Phase
  4. Operationalization Phase
  5. Data Preparation Phase

5 points  

QUESTION 9

How many levels does fdata contain in the following R code. data = c(1,2,2,3,1,2,3,3,1,2,3,3,1), fdata = factor (data)

A.2,3,2

B.1,2,3

C.5,3,1

D.1,2,6

5 points  

QUESTION 10

A _______ is a table-like data structure available in languages like R and Python

A.data frame

B.data file

C.data table

D.database

5 points  

QUESTION 11

Which of the following are Measures of Central Tendency?

A.Mean,Range, Mode

B.Mean, Standard Deviation, Range

C.Mode, Mean, Median

D.Range, Standard Deviation, Variance  

5 points  

QUESTION 12

  1. ________________________ is a probabilistic classification method based on Bayes’ theorem.

A.Naive function

B.Naive process

C.Naive Bayes

D.None of these

5 points  

QUESTION 13

  1. What is a type I error? What is a type II error? Is one always more serious than the other? Why?

5 points  

QUESTION 14

  1. The ______________ function builds a model of recursive  partitioning  and regression tree and have four parameters.

A.lpart()

B.mpart()

C.rpart ()

D.None of these

5 points  

QUESTION 15

In least squares regression, which of the following is not a required assumption about the error term ?

A.The expected value of the error term is one.

B.The variance of the error term is the same for all values of x.

C.The values of the error term are independent.

D.The error term is normally distributed. 

5 points  

QUESTION 16

During the Model Building phase, the team builds and executes _____________________________________________.

A.The models base on the work done in the Planning phase

B.The business requirement provided from business analyse

C.the models base on the work done in the Model Planning phase

D.None of the above

5 points  

QUESTION 17

Which of the following is the most important language in Data Science

A.C#

B.Java

C.Ruby

D.R

5 points  

QUESTION 18

Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon. You have been asked to determine if offering a coupon to visitors to your website has any impact on their purchase decision. Which analysis method should you use?

A.One-way ANOVA

B.K-means clustering 

C.Association rules

  1. D.Student T-test

5 points  

QUESTION 19

How many steps does a text analysis problem consist of 

A.2

B.1

C.3

D.4

5 points  

QUESTION 20

A time series can consist of all of the following components except:

A.Time lapse

B.Trend

C.Cyclic

D.Seasonality

5 points  

QUESTION 21

Additional time series methods include all of the following except which one.

A.Autoregressive Moving Average with Exogenous inputs (ARMAX)

B.Spectral analysis

C.Kalman filtering

D.Single variable time series filtering

5 points  

QUESTION 22

The goal of POS tagging is to ______ whose input is a sentence.

A.build a text file 

B.build a model 

C.build a database

D.build a text graph  

5 points  

QUESTION 23

What happens in the final Operationalize phase? 

A.Requirements are gathered

B.The team delivers final reports, briefings, code, and technical documents. They may also run a pilot project to implement the models in a production environment.

C.The team delivers draft reports, draft briefings, code, and some technical documents. They may also run a pilot project to implement the models in a production environment.

D.None of the above

5 points  

QUESTION 24

What can be done if during the Discovery Phase the team decides that the available data is insufficient?

A.Cancel the project

B.Collect Additional Data

C.Work with what you already have

D.Do nothing

5 points  

QUESTION 25

In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is

A.the correlation model

B.the regression model

C.used to compute the correlation coefficient

D.None of the above

5 points  

QUESTION 26

In chapter 8, a time series consists of an __________________ sequence of equally spaced values over time.

A.Unordered

B.Bilateral

C.ordered

D.lateral

5 points  

QUESTION 27

One advantage of ARIMA modeling is that the analysis can be based on _________________________for the variable of interest.

A.future time series data

B.historical time series data

C.historical time lapse data

D.None of the above

5 points  

QUESTION 28

What are the ‘resources’ being assessed in the Discovery Phase?

Cloud Resources

The business environment and business partners resources

Technology, Tools, Systems, Data, and People

None of the above

5 points  

QUESTION 29

Suppose you are using a bagging based algorithm say a RandomForest in model building. Which of the following can be true?

  • Number of tree should be as large as possible
  • You will have interpretability after using RandomForest

A.1

B.2

C.1 and 2

D.None of these   

5 points  

QUESTION 30

 HDFS block size is larger as compared to the size of the disk blocks so that _____________________

A.Only HDFS files can be stored in the disk used.

B.The seek time is maximum

C.Transfer of a large files made of multiple disk blocks is not possible.

D.A single file larger than the disk size can be stored across many disks in the cluster.

5 points  

QUESTION 31

The IDF inversely corresponds to the ______________________ , which is defined to be the number of documents in the corpus that contain a term.

A.document frequency (DF)

B.directory frequency (DF)

C.docker frequency (DF)

D.None of the above

5 points  

QUESTION 32

The arima () function in R uses ___________________________________ to estimate the model coefficients.

A.Maximum Likelihood Estimation (MLE)

B.Mini Likelihood Estimation (MLE)

C.Minimum Likelihood Estimation (MLE)

D.Mining Likelihood Estimation (MLE)

5 points  

QUESTION 33

R functionality is divided into a number of ________

A.Stored Procedures

B.Functions

C.Domains

D.Packages

5 points  

QUESTION 34

A __________________________is a simple and widely used visualization for finding the relationship among multiple variables and can represent data with up to five variables.

A.scatterplot

B.Dotchart and Barplot

C.Straight Plot

D.Box-and-Whisker Plot

5 points  

QUESTION 35

Which of the following R function can best provide descriptive statistics, such as the mean and median, about a variable as the sales data frame.

A.ggplot2 ()

B.dplyr ()

C.stringr ()

D.summary ()

5 points  

QUESTION 36

Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming. Which query interface would you recommend?

A.Howl

B.Pig

C.Hive

D.HBase 

5 points  

QUESTION 37

Many quantitative analysts use R as their____tool?

A.Leading tool

B.Programming tool

C.Primary Tool

D.All of the above

5 points  

QUESTION 38

In R, the ___________________ function creates a time series object from a vector or a matrix. 

A.ts ()

B.tk ()

C.ttime ()

D.plot()

5 points  

QUESTION 39

According to your text book, Chapter 4. clustering analysis groups _______________objects based on the objects’ __________.

A.similarity , cost

B.position, similarity

C.similarity, attributes

D.rank, attributes

5 points  

QUESTION 40

You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?

A.{banana, apple, grape, orange} must be a frequent itemset.

B.{banana, apple} => {orange} must be a relevant rule.

C.{grape} => {banana, apple} must be a relevant rule.

D.{grape, apple, orange} must be a frequent itemset. 

5 points  

QUESTION 41

In regression analysis, the variable that is being predicted is the

A.Response, or dependent variable 

B.Independent variable 

C.Intervening variable

D.Usually X

5 points  

QUESTION 42

Which function is used to create the vector with more than one element?

A.Library()

B.plot()

C.c()

D.par()

5 points  

QUESTION 43

During the Model Building phase, the team develops…

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply