This is a companion discussion topic for the original entry at https://dphi.tech/practice/challenge/55
Hi Team,
I am getting invalid output error expected 1 or 0 while uploading my submission file .As it is a multiclassification problem and the number of outputs classes are 4.I want to know why we are getting the error on uploading the csv file.
@sagarnarula could you please submit now? we change the evaluation metric and it should work fine now.
cc: @anjum_r @kanishksh4rma @parth_nipun_dave @harshita13 @srinathkr
target = pd.DataFrame()
target['prediction'] =y_pred
target.to_csv(‘Submission.csv’, index=False)
files.download(‘Submission.csv’)``
Yes, done! Thanks
How to improve accuracy? Do you guys try to change parameters and tune the models randomly? Or is there a better approach?Can you guys help me out?
I would suggest a RandomSearch or Gridsearch of the hyperparameters that are used in the model :), if that doesn’t work you could maybe try another model or do some feature selection/engineering?
Hi All,
Thanks @dphi for giving a nice competition to work on at the beginning of the year.
This time I wanted to do something unique (get the best solution in minimal lines of code). So here is my 6 lines code to get the second rank:-
import pandas as pd; import numpy as np; from sklearn.ensemble import ExtraTreesClassifier
train_df = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/train_set_label.csv" )
test_df = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/test_set_label.csv')
train_y = train_df['microorganism'].values
preds = ExtraTreesClassifier(n_estimators=200,random_state=2020,max_depth=21).fit(train_df.drop(['microorganism'],axis=1).values,train_y).predict(test_df.values)
pd.DataFrame(preds,columns=['prediction']).to_csv('extra_trees.csv',index=False)
I didn’t get the result in the first try though. I started with lightgbm, xgboost and catboost for my initial subs. Then used the library GML (developed by @muhammad4hmed and Naman) to get an idea as to which algorithms are performing well with 10 folds (link of demo present here :- https://github.com/Muhammad4hmed/GML/blob/master/DEMO/AutoMachineLearning.ipynb) and it showed Extra trees to be performing well than others. Since I got the best performing algorithm, I parameter tuned it and got the results.
congratulations! btw the person on first also used GML xD so it was GML vs GML BTW impressive short solution!