Solve Datathon 1: Advanced: Travel Insurance Claim Prediction Challenge | DPhi

Yes @lucifer_067
The pre-processing steps like one-hot encoding and removal of irrelevant columns should be done on test data. But do not remove any rows from the test data. You need to submit the same number predicted values as many numbers of records are present in the test data, else you will get an error while making submissions

why am i seeing evaluation error after i made my submission?
please is it the way you guys saw yours/

please i have followed all the steps in saving submission file and am still getting evaluation error. i think is from the index the sample format does not have index, while mine has and am trying to remove it by setting index=False, but it didnt work. i really need help.

@manish_kc_06 I am getting "cannot import name ‘six’ " while trying SMOTE for the question 3

However, I got the following
ERROR: Could not find a version that satisfies the requirement imblearn==0.4.3 (from versions: 0.0)
ERROR: No matching distribution found for imblearn==0.4.3

when tried

!pip install scikit-learn==0.23.1

!pip install imblearn==0.4.3

how can i remove the index from the dataframe?

Hi @samuelotisi
Generally, the given code (i.e. index=False) should work to remove the index. Try removing it manually in the excel sheet.

Hi @amahbubul85
Are you working on jupyter notebook or Google Colab notebook?

First, check the version of the tool. use is given code:

import imblearn
imblearn.__version__ 

And if the version is not 0.4.3, try running given code:

!pip install imbalanced-learn==0.4.3

OR, use google colab to solve the assignment.
Let me know if you still face the issue.

Thanks,

Hi! I submitted the results many times (tried using different web browsers as well), for yesterday and today. But Status keeps showing up as “Internal error”. Did anyone face a similar issue? How can I solve this? Hope someone cen help me, thanks.

Hi @semanurkps
The precision and recall value for your predicted values is zero. Both being zero sums (precision+recall) up to zero. The f1 score formula involves division by (precision + recall) and this is resulting in ‘float division by zero’ error which is an internal error. Please and optimize your model.

Thanks

1 Like

Hi @chanukya and @manish_kc_06 and all, if anyone can help me to solve the evaluation error for my predictions.csv.
I have followed all these steps:

  • header name - “prediction”
  • there is no index column in my prediction file
  • same number of observations as in test dataset (14478,)

Kindly help to solve this evaluation error :slight_smile: Thank you.

  • header name - “prediction”
  • there is no index column in my prediction file
  • same number of observations as in test dataset (14478,)

Still i am getting error that number of records are few.

1 Like

@ashishjadon even I am facing the same issue. Maybe they might have set some specific fl-score criteria or something else is the issue. :slight_smile:

Hi @semanurkps
We can’t see your notebook. You can ping me in slack. Also, don’t share notebooks while the assignment is live on the public forum.

1 Like

Hi @ashishjadon and @vinrock19
There are 15832 observations in the test dataset

The issue is solved. Just in case anyone faces similar issues in future: it was because my predictions are float (1.0) and they must be int (1) for the evaluation. Thanks! I removed the comment including notebook link as well, sorry.

Hi, @manish_kc_06 now I understood that you are referring to a separate test_dataset, one provided with the link in data. Actually, I was referring to the test_data obtained after the split.

So, I need to perform OHE for this test dataset also, am I right?

@manish_kc_06 When I am doing One Hot Encoding, I get the following error after I pass it to model trained on train_dataset-

ValueError: Number of features of the model must match the input. Model n_features is 49 and input n_features is 48

Because Product Name feature in the test_dataset has 25 classes, while in train_dataset there are 26 classes.

What to do in this case?

Hi @vinrock19
Yes, you are right

1 Like

please go through this: Handling Unknown Categories in both train and test set during One Hot Encoding

1 Like

Thanks, @manish_kc_06 now I realized where I was missing out :slight_smile: