Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I don't get. Why isn't the model open if it works? If it isn't this is just a fart in the wind. If it is the findings should be straightforward to replicate.


Yes, the community should force Nature to up its standards or ditch it. Software replication should be trivial in this day and age.


All these papers doing "research" on how to better prompt ChatGPT would be unpublishable then, given that API access to older models gets retired, so the findings of these papers can no longer be reproduced.

(I agree with you in principle; my example above is meant to show that standards for things such as reproducibility aren't easily defined. There are so many factors to consider.)


Well since you put "research" in quotes, I think you also agree that this type of work does not really belong in a quality journal with a high impact factor ;)


This, their training data doesn't even seem to be open either. So it's literally impossible to replicate their model. This makes me highly skeptical.



As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?


I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: