Python和R代码机器学习算法速查对比表

    xiaoxiao2024-03-16  122

    在拿破仑·希尔(Napolean Hill)所著的《思考致富》(Think and Grow Rich)一书中,他为我们引述了Darby苦挖金矿多年后,就在离矿脉一步之遥的时候与宝藏失之交臂的故事。

    思考致富中文版的豆瓣阅读链接:

    http://read.douban.com/reader/ebook/10954762/

    根据该书内容进行的修改

    如今,我虽然不知道这故事是真是假,但是我明确知道在我身边有不少这样的“数据Darby”。这些人了解机器学习的目的和执行,对待任何研究问题只使用2-3种算法。他们不用更好的算法和技术来更新自身,只因为他们太顽固,或者他们只是在耗费时间而不求进步。

    像Darby这一类人,他们总是在接近终点的时候而错失良机。最终,他们以计算量大、难度大或是无法设定合适的阈值来优化模型等借口,放弃了机器学习。这有什么意义?你听说过这些人吗?

    今天给出的速查表旨在改变这群“数据Darby”对机器学习的态度,使他们成为身体力行的倡导者。这里收集了10个最为常用的机器学习算法,附上了Python和R代码。

    考虑到机器学习方法在建模中得到了更多的运用,以下速查表可以作为代码指南来帮助你掌握机器学习算法运用。祝你好运!

    对于那些超级懒惰的数据Darbies,我们将让你的生活过得更轻松。你可以在此下载PDF版的速查表,便可直接复制粘贴代码。

    机器学习算法

    类 型

    监督学习

    非监督学习

    增强学习

    决策树

    K-近邻算法

    随机决策森林

    Logistics回归分析

    Apriori算法

    K-均值算法

    系统聚类

    马尔科夫决策过程

    增强学习算法(Q-学习)

    线性回归

    #Import Library

    #Import other necessary libraries like pandas,

    #numpy...

    from sklearn import linear_model

    #Load Train and Test datasets

    #Identify feature and response variable(s) and

    #values must be numeric and numpy arrays

    x_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets x_test=input_variables_values_test_datasets

    #Create linear regression objectlinear = linear_model.LinearRegression()

    #Train the model using the training sets and #check scorelinear.fit(x_train, y_train) linear.score(x_train, y_train)

    #Equation coefficient and Intercept print('Coefficient: \n', linear.coef_) print('Intercept: \n', linear.intercept_) #Predict Output

    predicted= linear.predict(x_test)

    #Load Train and Test datasets

    #Identify feature and response variable(s) and

    #values must be numeric and numpy arrays

    x_train <- input_variables_values_training_datasets

    y_train <- target_variables_values_training_datasets

    x_test <- input_variables_values_test_datasets

    x <- cbind(x_train,y_train)

    #Train the model using the training sets and

    #check score

    linear <- lm(y_train ~ ., data = x)summary(linear)

    #Predict Output

    predicted= predict(linear,x_test)

    逻辑回归

    #Import Library

    from sklearn.linear_model import LogisticRegression

    #Assumed you have, X (predictor) and Y (target)

    #for training data set and x_test(predictor)

    #of test_dataset

    #Create logistic regression object

    model = LogisticRegression()

    #Train the model using the training sets

    #and check score

    model.fit(X, y)

    model.score(X, y)

    #Equation coefficient and Intercept

    print('Coefficient: \n', model.coef_)

    print('Intercept: \n', model.intercept_)

    #Predict Output

    predicted= model.predict(x_test)

    x <- cbind(x_train,y_train)

    #Train the model using the training sets and check #score

    logistic <- glm(y_train ~ ., data = x,family='binomial') summary(logistic)

    #Predict Outputpredicted= predict(logistic,x_test)

    #Import Library

    #Import other necessary libraries like pandas, numpy... from sklearn import tree

    #Assumed you have, X (predictor) and Y (target) for

    #training data set and x_test(predictor) of #test_dataset

    #Create tree objectmodel = tree.DecisionTreeClassifier(criterion='gini') #for classification, here you can change the #algorithm as gini or entropy (information gain) by

    #default it is gin

    #model = tree.DecisionTreeRegressor() for

    #regression

    #Train the model using the training sets and check #score

    model.fit(X, y)

    model.score(X, y)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Library

    library(rpart)

    x <-cbind(x_train,y_train)

    #grow tree

    fit <- rpart(y_train ~ ., data = x,method="class") summary(fit)

    #Predict Outputpredicted= predict(fit,x_test)

    支持

    向量机

    #Import Library

    from sklearn import svm

    #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset

    #Create SVM classification objectmodel = svm.svc()

    #there are various options associatedwith it, this is simple for classification.

    #Train the model using the training sets and check #score

    model.fit(X, y)

    model.score(X, y)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Library

    library(e1071)

    x <- cbind(x_train,y_train) #Fitting model

    fit <-svm(y_train ~ ., data = x) summary(fit)

    #Predict Outputpredicted= predict(fit,x_test)

    贝叶斯算法

    #Import Libraryfrom sklearn.naive_bayes import GaussianNB

    #Assumed you have, X (predictor) and Y (target) for

    #training data set and x_test(predictor) of test_dataset

    #Create SVM classification object model = GaussianNB()

    #there is other distribution for multinomial classes like Bernoulli Naive Bayes

    #Train the model using the training sets and check

    #scoremodel.fit(X, y)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Librarylibrary(e1071)

    x <- cbind(x_train,y_train)#Fitting model

    fit <-naiveBayes(y_train ~ ., data = x) summary(fit)

    #Predict Outputpredicted= predict(fit,x_test)

    k-近邻算法析

    #Import Library

    from sklearn.neighbors import KNeighborsClassifier

    #Assumed you have, X (predictor) and Y (target) for

    #training data set and x_test(predictor) of test_dataset

    #Create KNeighbors classifier object model KNeighborsClassifier(n_neighbors=6)

    #default value for n_neighbors is 5

    #Train the model using the training sets and check score model.fit(X, y)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Librarylibrary(knn)

    x <- cbind(x_train,y_train)

    #Fitting model

    fit <-knn(y_train ~ ., data = x,k=5) summary(fit)

    #Predict Output

    predicted= predict(fit,x_test)

    硬聚类算法

    #Import Library

    from sklearn.cluster import KMeans

    #Assumed you have, X (attributes) for training data set

    #and x_test(attributes) of test_dataset

    #Create KNeighbors classifier object model

    k_means = KMeans(n_clusters=3, random_state=0)

    #Train the model using the training sets and check score model.fit(X)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Library

    library(cluster)

    fit <- kmeans(X, 3)

    #5 cluster solution

    随机森林算法

    #Import Libraryfrom sklearn.ensemble import RandomForestClassifier

    #Assumed you have, X (predictor) and Y (target) for

    #training data set and x_test(predictor) of test_dataset

    #Create Random Forest objectmodel= RandomForestClassifier()

    #Train the model using the training sets and check score model.fit(X, y)

    #Predict Outputpredicted= model.predict(x_test)

    #Import Library

    library(randomForest)

    x <- cbind(x_train,y_train)

    #Fitting model

    fit <- randomForest(Species ~ ., x,ntree=500) summary(fit)

    #Predict Outputpredicted= predict(fit,x_test)

    降维算法

    #Import Library

    from sklearn import decomposition

    #Assumed you have training and test data set as train and

    #test

    #Create PCA object pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)

    #For Factor analysis

    #fa= decomposition.FactorAnalysis()

    #Reduced the dimension of training dataset using PCA train_reduced = pca.fit_transform(train)

    #Reduced the dimension of test datasettest_reduced = pca.transform(test)

    #Import Library

    library(stats)

    pca <- princomp(train, cor = TRUE)

    train_reduced <- predict(pca,train)

    test_reduced <- predict(pca,test)

    GB

    D

    T

    #Import Library

    from sklearn.ensemble import GradientBoostingClassifier

    #Assumed you have, X (predictor) and Y (target) for

    #training data set and x_test(predictor) of test_dataset

    #Create Gradient Boosting Classifier object

    model= GradientBoostingClassifier(n_estimators=100, \ learning_rate=1.0, max_depth=1, random_state=0)

    #Train the model using the training sets and check score model.fit(X, y)

    #Predict Output

    predicted= model.predict(x_test)

    #Import Library

    library(caret)

    x <- cbind(x_train,y_train)

    #Fitting modelfitControl <- trainControl( method = "repeatedcv", + number = 4, repeats = 4)

    fit <- train(y ~ ., data = x, method = "gbm",+ trControl = fitControl,verbose = FALSE)

    predicted= predict(fit,x_test,type= "prob")[,2]

    原文发布时间为:2015-12-02

    本文来自云栖社区合作伙伴“大数据文摘”,了解相关信息可以关注“BigDataDigest”微信公众号

    相关资源:敏捷开发V1.0.pptx
    最新回复(0)