My point is that you wouldnt want to pick and choose albums - complete opposite of what you are talking about. Feel you arent reading my messages as Im obviously not talking about your average music listener
oh u guys are talking about some ai s*** lol nvm
idgaf about ai
Yea you dont need to download the whole thing. But if ur not downloading the whole thing, whats the difference than just using another music scrapper where you can download music in batches already ?
That scale is definitely feasible with other scraping tools. You can download whole discographies easily via torrents that amounts to that size pretty easily.
If youre super determined yes thats true, but my feeling is your average ML person is not going out there way to download that amount of songs, with the needed thought into what disocgraphies are good to search for, how big the discography is etc. Whereas now with the dataset readily available its an 'interesting problem' ready for someone to tackle.
So thats what I mean about less friction. Its like having a kaggle competition ready for you to tackle. (I assume you work in the field based on your msgs, but if not then apologies if im using terms youre not familar with)
I also think close to 300 TB is not too much of an ask for an inidividual/team that really wants to tackle this. You wont be storing that data long term and will never need to store it all at once as youll just change it all into features to train your model on and then youll delete the audio files.
Anyway in case this comes across as rambling now then sorry lol, was a good convo though
If youre super determined yes thats true, but my feeling is your average ML person is not going out there way to download that amount of songs, with the needed thought into what disocgraphies are good to search for, how big the discography is etc. Whereas now with the dataset readily available its an 'interesting problem' ready for someone to tackle.
So thats what I mean about less friction. Its like having a kaggle competition ready for you to tackle. (I assume you work in the field based on your msgs, but if not then apologies if im using terms youre not familar with)
I also think close to 300 TB is not too much of an ask for an inidividual/team that really wants to tackle this. You wont be storing that data long term and will never need to store it all at once as youll just change it all into features to train your model on and then youll delete the audio files.
Anyway in case this comes across as rambling now then sorry lol, was a good convo though
I mean they could just pick the top listed artists and just scrap their discogs, it would even be better since it would be in higher quality than this database.
While yeah having it all in one place makes it easier to download, I think the difference is not that much to be against this.
For a team doing the full 300tb, storing the data wouldn't be the biggest roadblocks it would be the training of all that data and how much power it would take
I mean they could just pick the top listed artists and just scrap their discogs, it would even be better since it would be in higher quality than this database.
While yeah having it all in one place makes it easier to download, I think the difference is not that much to be against this.
For a team doing the full 300tb, storing the data wouldn't be the biggest roadblocks it would be the training of all that data and how much power it would take
I dont really see much correlation between the 300 tb files and the power needed to train a model. That 300tb will be much reduced when you convert it into data you will actually train on.
Its possible we are thinking of training on different things though.
The point you make about it being higher quality on things like soulseek is true I agree. Ultimately though I think what will probably happen is you get some startup or existing company download the data and use it in a model thats used behind the scenes in some type of serivce to labels/muscians.
Do you work/study in the field?
I dont really see much correlation between the 300 tb files and the power needed to train a model. That 300tb will be much reduced when you convert it into data you will actually train on.
Its possible we are thinking of training on different things though.
The point you make about it being higher quality on things like soulseek is true I agree. Ultimately though I think what will probably happen is you get some startup or existing company download the data and use it in a model thats used behind the scenes in some type of serivce to labels/muscians.
Do you work/study in the field?
Larger training models end up using more energy in terms of power to train
Not data science, but Electrical and Computer engineering
Larger training models end up using more energy in terms of power to train
Not data science, but Electrical and Computer engineering
Yes thats true but I meant that it won't be trained on literally 300TB of data. (There is a correlation between dataset size and power needed to train but 300TB audio files =/= 300tb dataset size)
And cool, nice
Eletrical engineering is still such a good foundation to have
Anyway I feel Ive derailed the thread enough lol
Soulseek has the stuff that isn't on streaming. Would be lost without it tbh
I mean did I say otherwise
Welp
https://twitter.com/i/status/2044446399811715389in the immortal words of Bossman Dlow
"I ain't never turned myself in
Do your job b****, come find me!"
they might've won that default judgement but it'll be hell getting any of the money they're now owed