Getting Started With Data Science Bootcamp: Final Assignment | DPhi

perform the data cleaning accordingly. one of them is correct information, both of them cannot be true at a time.

@manish_kc_06 Hi, please confirm Datathon last day to submit notebook. In Dashboard - >Assignment tab, it is given 22 April but when I click on link to register last day is 25 April. Please confirm. Thanks is advance.

Hi @priyankagrover
The deadline for final assignment is extended by 3 days. So final deadline to submit predictions or upload notebooks is 25 April as given on the datathon page.

Hello, I need help in calculating the difference between call_start and call_end columns and converting them into int datatype. Please guide

Please help!
After applying OneHotEncoding to the train_data the shape becomes (3102, 4673) but when same encoding is applied on the test_data provided by dphi, the shape becomes (935, 1781). As a result, my model which scores 0.81 doesn’t work on the test_data provided.

I keep getting this error: ValueError: X has 1781 features per sample; expecting 4673

What am I doing wrong?

Hi @codepanther
You are not applying one hot encoding properly. Please visit this tutorial to understand how to do one hot encoding using OneHotEncoder(): Handling Unknown Categories in both train and test set during One Hot Encoding

Hi @akashbnsl88
First convert the data from object type to datetime type then do data[‘call_start’] - data[‘call_end’]

Yes i have been trying that but will it be possible to convert datetime format to int format after that for the purpose of EDA and using the column for measuring correlation.
If possible then how can we do that.
I can send you the code where i am stuck.

You can convert datetime object to timestamp. Refer this thread: Python pandas convert datetime to timestamp effectively through dt accessor - Stack Overflow

Hello, I wanted to know what is exactly the default_or_not column represents. Please explain it in a little detail, if possible.

Default is the failure to repay a debt, including interest or principal, on a loan or security. A default can occur when a borrower is unable to make timely payments, misses payments, or avoids or stops making payments. Default risks are often calculated well in advance by creditors.

Thank you @manish_kc_06, I’ve seen what I was doing wrong. I really appreciate your help.

@manish_kc_06 Hi Manish,
My notebook was showing evaluation error and by mistake it was marked as final submission. I am unable to upload new notebook. Is it possible for you to enable notebook submissions for my account?
Second, I have divided train test size to 80/20 ratio. On model prediction, it is returning result for 621 rows which is 20% of the data set. When I am uploading prediction.csv it is saying that predictions has less number of rows as expected. Please help me in this regard.
Thanks in advance.

Hi @priyankagrover
Please use the test dataset that we have provided to make the submissions.


there should be 935 rows in prediction file as there are 935 rows in the test dataset.

ok thanks. Only the rows in prediction.csv should be 935. But the number of features can differ from the features listed in test_data.csv? I have picked up only relevant features to predict for insurance buy or not.
for example: one of the column Outcome has more than 50% of null values so i decided to drop that column. Similarly, some other columns also dropped. Can we do this?

Make sure there are all the columns in test set that were there in the train set used for training the model. @priyankagrover

Hello, I have a query I had submitted both the final assignment and the quiz for this bootcamp, before time. But the dashboard doesn’t show this in completed bootcamps, neither have I received the certificate.

Hi @charanjeevkaur
We have not yet issued certificates to learners.

Okay, thank you!
Just a little confirmation, the bootcamp tag will be updated from past to completed when we receive our certificates, right?

Can you check again? It should be already showing in past.