Well no I wouldn’t. Not because I think piracy is wrong, yo ho ho. Not because I think machine learning is completely worthless. Because garbage in garbage out. The vast majority of humanity’s written works is contradictory, utilizes flawed logic, is based on flawed information and data, is intentionally misleading, or an outright fabrication. Even with trustworthy work there’s so much assumed context it could easily be misinterpreted or indecipherable. Failing to curate training data is just asking for bullshit.
Although I agree with the stuff about written things being contradictory, but I think your comment is a little reductive about machine learning.
Machine learning has rapidly transformed many areas, here’s a few:
Object recognition
Facial recognition
Medical imaging
Language translation
Speech recognition
Text generation
Drug Discovery
Genomics
The list is rather endless really. Take text recognition and computer vision. People that are blind can now wear Meta (shit company I know) glasses and actually go shopping, pick up items and have the labels read to them. Thats fucking awesome.
Yeah I was only talking about in the context of LLMs, which honestly don’t feel like the best use of machine learning to me. Real scientists and engineers using machine learning to create efficient heuristics to solve real bounded problems, and actually verifying the output through conventional means, is incredibly powerful. There’s still a lot of overhyped bullshit out there outside the LLM chatbot space, but the point stands that any machine learning algorithm should have its training data carefully curated. The techbro strategy of throwing more nodes and random data at an LLM hoping it will magically hit some exponential threshold of performance is stupid.
Well no I wouldn’t. Not because I think piracy is wrong, yo ho ho. Not because I think machine learning is completely worthless. Because garbage in garbage out. The vast majority of humanity’s written works is contradictory, utilizes flawed logic, is based on flawed information and data, is intentionally misleading, or an outright fabrication. Even with trustworthy work there’s so much assumed context it could easily be misinterpreted or indecipherable. Failing to curate training data is just asking for bullshit.
Although I agree with the stuff about written things being contradictory, but I think your comment is a little reductive about machine learning.
Machine learning has rapidly transformed many areas, here’s a few:
The list is rather endless really. Take text recognition and computer vision. People that are blind can now wear Meta (shit company I know) glasses and actually go shopping, pick up items and have the labels read to them. Thats fucking awesome.
Yeah I was only talking about in the context of LLMs, which honestly don’t feel like the best use of machine learning to me. Real scientists and engineers using machine learning to create efficient heuristics to solve real bounded problems, and actually verifying the output through conventional means, is incredibly powerful. There’s still a lot of overhyped bullshit out there outside the LLM chatbot space, but the point stands that any machine learning algorithm should have its training data carefully curated. The techbro strategy of throwing more nodes and random data at an LLM hoping it will magically hit some exponential threshold of performance is stupid.