This is a companion discussion topic for the original entry at https://dphi.tech/challenges/171
In the Data Tab, the TEST DATA SET is not Visible.
Make the test dataset visible as soon as possible
The words in tweets are incomplete and have some unrecongized characters like “Congratulati— and ““strategically”. I request you provide us tweets with clean words or tell us a way to cleam them.
@organizers in the test dataset after sorting the rank column we are getting 8 data points for the same rank. So, the data might be of 8 days and for rank 1 all the 8 values decipts different dates. But we are not given any date column. If we rank them generally then we will get 1 - 744 ranks, but it is 1 -93 for all 8 days.
So, can you please add the date column or make the dataset clear.
Hi @sameer07 , you need to come up with a model which should not be dependent on any particular date of any particular company, it should be general in a sense that the model will be able to predict the ranks for the next 3 days based on the current rank, followers , tweet info etc.
Can you let me know if we can submit more than once? WIll the best score be taken into consideration among all the submissions?
@organizers please reply.
@oyetripathi we expect the participants to do the text preprocessing on their own.
@muqeet yes you can submit more than once, the best score will be taken into consideration among all the submissions.
@dphi_official what is the submission limit for this competition ?
because it is showing that we have exceeded our submission limit, whereas we have only submitted 3 submission files today . But, previously there was no restriction limit for no. of submissions per day.
Please send us updates about the limits.
@sparshal Sorry, I could not attend doubt clearing session due to college schedule, basically we need to prepare a model that takes in say -
A random variable that represents sentiment of tweet positive negative etc.
count_tweets
avg.
.
.
.
followers
change
rank
as input vector and output the rank on subsequent days as output vector.
If this is the case then what is meant by "Here the rank is calculated based on percentage change in number of followers ". That is a column in the train dataset, what is exactly meant by based on ? Is there a formula that ranks based on percentage change, please explain a bit. Thanks