Solve Datathon 1: Beginners: Taxpayer's Political Party Challenge | DPhi

A tax is a compulsory financial charge or some other type of levy imposed on a taxpayer (an individual or legal entity) by a governmental organization in order to fund government spending and various public expenditures.

This is a companion discussion topic for the original entry at

Can someone tel me what format to keep the csv file in. I have followed the instructions and the format of the file to be submitted but it keeps giving me an evaluation error

@sabeehalam have you changed the column name to prediction? - if you still face the issue after changing this, you can send your prediction file to

no i had to convert it to a table to make it work

I have completed the challenge, but my prediction model only got accuracy of 21%. If possible, can someone share the work script on what you suppose to do to make the prediction model. I am a beginner in this field and I really want to learn it. So I hope someone who got high accuracy prediction model can share their work script after the deadline.

Is there a specific accuracy we have to reach or just about doing however well we can do it?

If this Datathon ends, is DPhi going to put the top model with the prediction accuracy more than 70% at github? I want to learn how they can do it. I have tried hyperparameter tuning GridSearchCV, RandomizedSearchCV and GradientBoostingClassifier, but my model could only get accuracy 40%. So I have no idea what else to do to improve the accuracy.

Also in leaderboard, I saw people with 100% accuracy. Is it really possible?? it doesn’t seem make sense in real life situation to get 100% model accuracy. Is it not going to overfit?

Exactly my thought man. The features are barely even correlated to give that good a model. But I don’t exactly know too much about the field to be commenting on someone’s work.

My Prediction is in csv file with the header “Prediction”, till showing evaluation error.
Pls What can I do to rectify this

Even I’m facing the same problem :frowning:


How to evaluate the accuracy for test data as the target variable is not given.
Accuracy requires target as well predicted value.

Is it necessary to split the training data or we can use that dataset as it is?

I got the fix…actually the issue was prediction file has index column so wat i did was i deleted those and then tried to submit. It finally got submitted but accuracy being very low (around 36%)

You have to split and train then after model is created you follow the guidelines to submit i.e generate csv file by creating new_test_data and predicting your model output and saving it in target variable

But separate train and test data is given , again why we have to split

But that test file has only input features how will you get y_test so i think spiltting is necessary

but that y test will be related to test data only, so why we have to split the train data.

How did you evaluate accuracy for test data?
What was the accuracy of train data?

They have clearly mentioned that target(output) is not present in the test data provided by them.

You need to evaluate accuracy using accuracy_score method but this is for us to find out accuracy of model we cant evaluate the accuracy of test data provided by them so we need to just send the predicted values in csv file this will be compared with real values (hidden to us) when you submit the csv file you will get the score of your model.

Hi @ismail
Don’t worry! We will be sharing the top performer notebooks at the end of the bootcamp so that all other learners can go through it and learn.

1 Like

Hi @sabeehalam
No you don’t need to reach any specific accuracy. You can try your model perform better as much as you can by optimization.

1 Like

That 's what I said, we can’t evaluate the accuracy of test data?

Means we cannot evaluate the accuracy for this datathon