Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
OpenAI is giving Microsoft exclusive access to its GPT-3 language model (technologyreview.com)
172 points by ColinWright on Oct 10, 2020 | hide | past | favorite | 89 comments


Read this: https://openai.com/charter/

Then read this: https://openai.com/blog/openai-licenses-gpt-3-technology-to-... (the very first dot point)

Dear OpenAI, please provide more details of the arrangement with Microsoft. The lack of transparency here shows a clear departure from your charter.


This is a very vexing situation. I applied for access months ago. I am a scientist and academic with dozens of articles on topics related to GPT (and especially survey articles and a few articles reasonably critical of some NLP/ML/DL frameworks). I am not sure I can reveal more without doxing myself.

I haven't heard anything from them yet. They are free to not release GPT-3 but not free to mislead the open community with this open and holier-than-thou nonsense.


The charter only constrains their use of AGI, which they define at the top in a way that excludes all of their current and near-future work.


"We commit to use any influence we obtain over AGI’s deployment to ensure it is used for the benefit of all, and to avoid enabling uses of AI or AGI that harm humanity or unduly concentrate power."

That sounds to me like they committed to avoid using any AI to concentrate power.


But it is written: "unduly" concentrate.

The whole sentence is just yada-yada anyway. All "commitments" are constrained by some undefined or impossible exception. Defining "unduly" would be even considerably easy compared to finding a definition for what could be meant by "harm humanity" or be "for the benefit of all". The last constrain is nonsense in itself actually, or does "all" except Microsoft who would not benefit if they wouldn't got an exclusive contract? Not to mention all the people who will lose their jobs as those jobs will be replaced by AI eventually… They didn't say "all equally" which would make at least some little bit sense.


Can someone who has done work for OpenAI sue them for not sticking with their charter?


I just whipped up a Chrome/Firefox extension to replace all instances of "OpenAI" with "ClosedAI" (or anything else you like) on the web, in the spirit of the legendary Cloud to Butt. Help yourself.

https://github.com/clopen/ClosedAI


Bullshit JS, add OPENI to the dictionary


Why the down votes? Does it contain "trigger words"? ;-)

I think even the down voters would have liked it if they would have looked it up! See yourself:

https://mourner.github.io/bullshit.js/


:thumbs_up:


I think the even bigger issue than something like GPT-3 is how inaccessible a lot of these ML techniques are without huge datasets that are only available to huge companies like FAANG and social media companies. While Twitter is definitely not the best site for a lot of research, it ends up getting used for tons of research because it is one of the only companies that allows others to access even a sample of its data without going through lots of hoops.

I think opening up the models is a good start, but I think it is even more important that we allow responsible use of these huge datasets. I also think that ensuring responsibility probably needs to be done by governments rather than companies since allowing companies to apply a lot of restrictions to data access leads to them cutting access off to everyone. If we need to restrict access to certain data because it might be abused, either the collection of that data needs to be banned for all companies or maybe we need to place restrictions on how that data is processed/used rather than restricting its access to a few people.

Disclosure: I work as a SE at Microsoft.


Yes, I think the people should be in charge of their data. And if a company grows because of user data, then that company should be owned by the people who fed it that data.

In the old days we would use telephony to talk to each other, and even the thought of a company listening in on the conversation was abhorrent. But nowadays it seems like it's business as usual. What went wrong?


Actually its not the dataset that matters, its the access to unbounded compute. GPT-3 is trained on data scraped from the web...


I really can't stand the bait and switch branding of OpenAI. They are anything but open, and there is no reason to believe their tall tale that the only reason they don't release the model publicly is "ethical" reasons -- it's for monetary reasons and it's gatekeeping. Not only that, but they are playing a dangerous game strategically. Companies like Hugging Face actually release valuable parts of the open source NLP ecosystem which set new records for SOTA and which the rest of the industry can use.

For all the practitioners here who probably know more about this than I do, help me understand: how much of OpenAI is smoke and mirrors? Is it just me, or is GPT-3 just GPT-2 but with way more training data? Are there fundamental conceptual differences with how the scale of that data is handled?


OpenAI is exploiting the rules of society, as every human entity does under sufficiently strong optimization pressure.

"Sufficiently Powerful Optimization Of Any Known Target Destroys All Value" https://thezvi.wordpress.com/2019/12/31/does-big-business-ha...

In the next 50 years, the digitization of everything will cause optimization pressure to steadily increase, we have an "unstoppable force meets immovable object" situation which historically results in societal collapse / war / "black swan" events


Pretty much share this sentiment. I read their FAQ page and it left a really bad taste in my mouth.

A further question that I'd like to add - are there any _truly_ open source initiatives that can remotely compete with GPT-3? What's stopping the OSS community (specifically, not just "funding")? Is it lack of compute power? Lack of data?


Access to the right compute infrastructure. While the training itself cost 4.6$M (unsure if this include hyper parameter search done on smaller models beforehand), even if an open source organization had that money available, I don't believe they would be able to access the kind of infra needed to train GPT-3. The closest thing would probably be some TPU pods on GCP, but even then, I think the biggest pod you can publicly access is too small (unless you are ok training for 3 years).


> even if an open source organization had that money available, I don't believe they would be able to access the kind of infra needed to train GPT-3. The closest thing would probably be some TPU pods on GCP, but even then, I think the biggest pod you can publicly access is too small (unless you are ok training for 3 years).

Surely, at that budgetary scale, buying/building your own infrastructure would be cheaper than renting it? This is the same economic calculation large animation studios have to make WRT render farms (which also use similar types of hardware).


The price of a cluster similar to the one OpenAI used is orders of magnitude higher. We are talking about a price tag that at the very minimum would be in the 150M+ range, you can probably multiply that price by 2 or 3 (10 thousand V100, 300k CPUs, infiniband between every node, network storage etc.).

Almost no one has that kind of money to invest up front (especially not an hypothetical open source organization like discussed here). Even if somehow one had that money available, for this to be more economic than renting you need to have it running at full capacity at all times for years, not many research groups can come up with many interesting projects at a scale large enough to utilize that kind of cluster efficiently.

So buying something like that is only conceivable for huge research groups with hundreds of researchers and tons of cash: Google Brain, DeepMind, Microsoft and probably Facebook. Even OpenAI wasn't able to afford it directly...


> especially not an hypothetical open source organization like discussed here

Hypothetical open source organization is not hypothetical: https://www.eleuther.ai/home

However, the compute constraints are correct. We plan on using TFRC's TPUs in order to replicate GPT-2 (with a few changes to support scaling to GPT-3 and beyond as well as the larger context length). With success on a GPT-2 model we hope to garner enough interest to get more compute to train a full-sized GPT-3 model. After that, it's 1T or bust.


> After that, it's 1T or bust.

A trillion parameters? Damn.


Existing HPC projects (aka "supercomputers") show that money is available for hardware-intensive research projects (perhaps the lesson here is that the US needs an AI/ML focused National Lab), and as far as the cost of creating a bespoke cluster for a single project, I think that running it full bore for a project and then selling it off ASAP would avoid most of the downsides you've mentioned.

Again, I have a feeling that big-budget render farms have trod this path before.


I think it's just money. I'm not an expert, but my understanding is there's nothing fundamentally new or revolutionary about GPT-3. It's not a new AI paradigm, it's just taking existing techniques and turning them up to 11.

If you throw absolutely enormous amounts of training data and compute resources at a bigass neural network, you get surprisingly good results. Anyone else with a few tens of millions of dollars could do similarly.


For GPT-3, I think they mainly scraped stuff off the internet. Anyone with enough time and machines can replicate that, but it is a non zero effort. GPT is an exception here I think. It's harder to build open source MRI diagnosis tools for example when nobody gives you open source MRI scans with diagnosis data.

It's more a lack of compute power problem plus a lack of people problem. The compute cost alone was estimated to be 4.6 million USD. As for the people, folks who can push SOTA are extremely sought after. So the individuals involved also pay a large opportunity cost by working on things for free or comparatively small open source pay.

Also, I think many open source organizations/communities haven't recognized the value of open source ML models yet and don't invest in them as much as in parametric models. I think this will come with time. Last, anyone with a computer can write code, while few people have access to the hardware needed.


Possibly a hot take:

I would say that OpenAI is still “open”, in about the same sense that Nginx or Redis is “open”. They seem to be pursuing an open-core model, where the architecture and skeleton of the project is open, but the extra stuff that lots of corporate money was dedicated to building on top is closed.

Remember, GPT-3 is just GPT-2 with (a lot) more training data—training data that cost a lot of money to acquire, and therefore training data that (unlike GPT-2’s training set) has corporate interests invested in its creation. GPT-2 is still the “core” of GPT-3’s architecture; GPT-3 can be seen as a commercial “extension” on top of the GPT-2 core.

If you want to understand “how GPT-3 works”, you can just study GPT-2. Nothing is different architecturally between them.

It is a shame that GPT-3 isn’t open for study, because it has emergent capabilities (meta-learning) that GPT-2 didn’t express, and which are worthy of study. But they couldn’t have known it would do that when they set off to create GPT-3; and they did know they needed a bunch of money for compute + training-data scraping, so they likely made agreements (handshake or contractual) to license this beefed-up “extension” to GPT-2 in advance of knowing what it could do; and in fact as a prerequisite to getting that funding to build it.

I would suspect that OpenAI’s goal with GPT-3 was mostly to observe for themselves what a much-better-trained GPT-2 would be capable of; to study those observed capabilities (like meta-learning), and to try to figure out if they can replicate them using only tweaks to GPT-2’s architecture, without needing the big load of commercially-funded training data. (Sort of like seeing what optimal solutions a brute-force algorithm can find, and then trying to come up with an efficient algorithm to emit those same solutions.)

If they can come up with something like that, then there’s likely a GPT-4 brewing right now, which wouldn’t need all that compute and training data to create; and therefore wouldn’t be beholden to corporate interests; and therefore would be open for study.


The "Open" in OpenAI means that they want their customer to open their pockets, it has nothing to do with Open Source, open core or anything like that.

The cultural war of "but they deserve to make money" is misguided. You are free to make money, just don't call your projects open source if for one reason or another you do not want to open up the source of your project. And don't call it open core if you don't want to open up the core of your project. GPT-3 might just be "GPT-2 plus something more", but a proprietary code based forked off of Linux would also just be "Linux plus something more", that doesn't make it open source or open core, because neither the core nor the source is open.

That said, I am not against the OpenAI project going forwards with early releases of proprietary offerings, it seems like a fine way to finance their work, and they can still support the open source community. But criticism of the choice is warranted if they keep trying to brand themselves as open.


I would compare GPT-3 to the original Doom. In both cases, the codebase/runtime/client part is open source; while the asset data that creates the experience everyone is familiar with, is proprietary. In both cases, you’re free to create your own asset data to use with their codebase/runtime/client.

In both cases, I find it helpful to imagine an alternative world where the “engine” and the “assets” were created by separate companies.

In such a world, you could imagine Id.A publishing a standalone open-source “Doom engine”; and then Id.B making a proprietary game called “Doom” using that engine.

Similarly, in such a world, you could imagine OpenAI.A publishing a standalone open-source “GPT-3 architecture”; and then OpenAI.B training a proprietary model called “GPT-3” using that architecture.

In both cases, that’s exactly what happened; only Id.A and Id.B were both just Id; and OpenAI.A and OpenAI.B were both just OpenAI. And because of that, each company decided to refer to the whole thing they were doing—both the open engine, and the proprietary assets—as a single project with a single name (where if they had been two separate companies, the necessity of securing trademarks would have pushed them to have two separate names for those two things.)

People’s moral paranoia in cases like this, comes down to the fact that there’s a name collision that they aren’t noticing. The engine of the original software Doom, and the assets of the original game Doom, are both just called “Doom”; and likewise, the architecture of GPT-3, and one trained model built on that architecture, are both just called “GPT-3.” If you separate them, it’s clear that one is intended to be open-source, while the other is not. But, for whatever reason, the companies involved didn’t separate the concepts, leading to this illegibility of whether “GPT-3” is open-source or not. It’s a meaningless question, because the label “GPT-3” isn’t cleanly pointing to a single concept.


> If you want to understand “how GPT-3 works”, you can just study GPT-2. Nothing is different architecturally between them.

This isn't true. At the very least, off the top of my head, GPT-2 didn’t have the sparse layers that GPT-3 has.


That seems logical and a like it would hold true, but the spirit of being "open" is completely lost once you involve commercial, for-profit interests. I would like to highlight Elon's (co-founder) thoughts on this as a clear deviation from their initial goals:

"This does seem like the opposite of open. OpenAI is essentially captured by Microsoft."


I would think that GPT-4 would be exclusively licensed to one of the other major players for even more money.

Just because it doesn't "need" the compute/training data to create doesn't mean people aren't going to sell it for money.

People need to eat at the end of the day.


Old news and already discussed https://qht.co/item?id=24558329


Thus changing its name to ExclusiveAI? No? Still the air of moral superiority?


When Progress Inc. changed the name of their awful-yet-surprisingly-popular closed-source (even [anti-open-source](https://www.progress.com/tutorials/odbc/open-source-database) ) 4GL from ProgressDB to "OpenEdge" it was a reminder to me that there is absolutely nothing to a name.

"A rose by any other name would smell as sweet" - similarly: "Open[BLANK] by any other name would be just as much marketing bollocks".


Progress 4GL was an excellent language for many years, a superb framework for building CRUD applications with inline data manipulation features that long predated Linq etc.

Eventually it fell so far out of the norm it became difficult to sustain as a career choice but it doesn't warrant your dismissal I think.

Agree the name change was dubious, though an understandable attempt to get away from dated 4GL connotations.


> but it doesn't warrant your dismissal I think

I agree that their start in the late-1980s on IBM's AS/400 - Progress was a nice system to build on-top of. But only for those few years.

The main problem with Progress 4GL - just as with all other 4GL systems - is that they're vertical silos: 4GL vendors wanted devs to stay inside their own proprietary ecosystem and they made little-to-no-attempt at integrating with anything third-party. The fact it took until the late-1990s for Progress to add support to SQL to Progress (and require hours of sysadmin work of editing low-level configuration files just to get their "broker" system working) shows where they saw their money coming from: small-to-medium-sized business customers' business systems locked-in to their 4GL system's OLTP model.

I won't go into the major inherent problems in 4GL vertical silos - the fact they've all completely failed or fallen into irrelevance by now is evidence enough (Progress, Paradox, FoxPro, etc). Businesses now realize that interoperability and data portability are very important, which is why database vendors like Oracle, Postgres, and others saw success in offering only a database product - and aim to be the best database product with the best interoperability.

I have an axe to grind with Progress Inc over this very point: there's still a good number of business customers of Progress systems written decades ago that are looking for a way out. I was asked to build a simple data-connector to slurp data from an on-prem Progress database and that's where I ran into not only technical problems, but primarily licensing problems: Progress Inc (the company) made it very clear to me that the only way I'd be able to write a program to get data out of that on-prem database is by ponying up north of $3,000 USD just for an SDK for my own machine, in addition to extra "per-user" licenses for each on-prem site that would be using my software to liberate/exfiltrate their data (never mind that those on-prem sites already had Progress licenses - but they didn't have the right kind of user license). Compare that to the predominant database vendors around today that are either explicitly open-source or give-away developer-licenses to their Enterprise-SKUs away for free - and put effort into making solid client-libraries for rival platforms. Progress' sheer intransigence and their continued opposition to the past 25+ years of industry trends is just mind-boggling.

(I wryly note that only in the past couple of years or so has Progress started to admit their product isn't as great as... frankly, everyone else. So I see they've changed the company's direction away from their 4GL and onto buying-up component vendors like Telerik - unfortunately for them I feel this is coming too late because now even being a proprietary component-vendor feels outmoded given that modern UI platforms and frameworks are well-serviced by gratis and libre open-source offerings, especially first-party libraries like Google's Material Design and Microsoft's Metro/Modern/Fabric/Fluent/WhateverItsCalledThisYear.

----

Footnote:

You wrote:

> inline data manipulation features that long predated Linq etc

From what I gathered from their OLTP system built-in to their 4GL - it's only superficially similar to Linq. It isn't a true relational-calculus system (like Linq) - nor relational-algebra system (like SQL). Please correct me if I'm wrong - my experience with Progress is limited to versions prior to 11 and I understand they have added some modernizations to their platform - but it's still very long in the tooth.

Progress Inc's problem is that their software isn't very sexy, which means they'll have problems attracting the best and brightest in the industry - after all, if you're a hotshot kid from Stanford who solved the object-relational impedance mismatch problem - why would you want to work for Progress?


The fact it took until the late-1990s for Progress to add support to SQL to Progress

No idea where you are getting that from, SQL has always been supported, but no reason why it should have been a priority given the vastly superior 4GL for creating applications against Progress until the 2000s.

database vendors like Oracle, Postgres, and other saw success in offering only a database product

Different companies, different priorities; Microsoft an obvious counter to that claim.

Comments on Linq, SQL etc; no, my point was it was direct and inline access to data. It was a very successful but niche approach that eventually got traction as a concept and alternative to SQL when tools like Linq arrived. But Progress devs had had that ease of data access in their programs for years, under the cover of the unfashionable 4GL label.

I agree with you on it's general awkwardness wrt industry trends, and obscure pricing and marketing strategies.


> No idea where you are getting that from, SQL has always been supported

My mistake. I did some research and I found [this page](http://www.oehive.org/VersionHistory.html) which says that SQL-89 support was added in 1988, but SQL-92 support wasn't added until 2000.

But even so, SQL-89 is very anemic and not very expressive compared to SQL-92 (I'm thinking primarily of the support for different types of JOIN expressions), and the fact it took them 8 years to add SQL-92 support is telling (consider that MS Access 97 supported most of SQL-92). I can't find any information suggesting that Progress/OpenEdge implements later standards like SQL-1999 and SQL-2003 which add features I use on an almost daily-basis (recursive CTEs and window functions respectively).

> my point was it was direct and inline access to data

Doesn't that place architecture-based restrictions on performance though - how well does concurrency and transactions work in their OLTP model?


> (recursive CTEs and window functions respectively).

The ABL language supported functionality like that in the language for ages.

> Doesn't that place architecture-based restrictions on performance though - how well does concurrency and transactions work in their OLTP model?

By what metric?


What does this have to do with the discussion about OpenAI changing not just strategy but what feels to many as changing their core values that they always promoted as being the central reason for them existing?

"But what about ..." is irrelevant

https://en.wikipedia.org/wiki/Whataboutism


> What does this have to do with the discussion about OpenAI changing not just strategy

Nothing to do with that.

Nor is this a case of whataboutism. I am not justifying one company's actions based on another unrelated company's poor business decisioins.

I am stating that given this deal with Microsoft, the "Open" in OpenAI's name is now meaningless and/or dishonest. I used the example of Progress/OpenEdge as another example of a company putting "open" right into their name but failing to live up to their name - implying that Progress' name-change (and also: OpenAI's initial name) was a cynical marketing ploy to make people believe they're the _good guys_ in all this, especially when the AI/ML field has many detractors from the altruistic camps, especially after Deep-Fakes, Palantir, Cambridge Analytica, and Facebook have all left a bad taste in everyone's' mouth.


Fair enough and thanks for the elaboration. I had initially read your comment as excusing OpenAI as others abuse 'open' marketing as well.

I think we are mostly on the same page here.


I hope they can give "Open AI" name to huggingface instead. This exclusive agreement is anything but "OPEN".


OpenAI's need for these types of "closed" deals is driven by the fact their technology is fundamentally dependent on data & compute infrastructure that only a few organizations in the world can afford. It doesn't matter how many white papers they publish making AI "open" if it costs $100Ms to train & deploy.

You could do more to make AI "Open" by working on semiconductors to bring design, test, and fab costs down by orders of magnitude or working on modeling which drastically reduces training & data requirements.


Who said Microsoft is getting an "exclusive license"? The OpenAI blog post doesn't. It's licensed to Microsoft now, and presumably to other companies in future (to not do so would be leaving money on the table).


> Who said Microsoft is getting an "exclusive license"? The OpenAI blog post doesn't. It's licensed to Microsoft now, and presumably to other companies in future (to not do so would be leaving money on the table).

It's right there in the article, including a link to Microsoft's announcement, which says, among other things:

"Today, I’m very excited to announce that Microsoft is teaming up with OpenAI to exclusively license GPT-3 [...]"

I'm pretty sure that when Microsoft claims they have an exclusive license to some IP, they actually do.


Thanks. I didn't see Microsoft's announcement.


Well Microsoft did give them some sort of discount or something for their azure servers for training. I think it was part of the agreement or something


> Well Microsoft did give them some sort of discount or something for their azure servers for training. I think it was part of the agreement or something

I think about half of the money Microsoft invested in OpenAI has been cash, the rest was in the form of Azure credits.


I realize it isn't obvious why this probably precludes a discount.

For any given deal, it is much simpler for all parties to adjust the amount of credits being given than to negotiate a discount.


The GPT-3 paper clearly states that they used a MS GPU cluster, with estimated cost elsewhere in what, millions? Why is it a surprise then, that they get exclusive access?

I think the real story is something like, the rockstar ML team at OpenAi sells its projects to the highest bidder in terms of hardware access, and Google and Amazon just missed the boat on it.


Contrarian Opinion: It doesn't matter one bit.

Take a walk down any science museum, and you see a consistent pattern in technology development: people will push an existing technology to its absolute extreme before they come up with the next clever step.

That's GPT-3. It's no holy grail.

I find work of the kind that Hugging Face are doing much more interesting, because they are trying to move forward. Their approach is trying to overcome two stumbling blocks for which GPT-3 has no answer: One, that useful language is domain-specific (in the sense that, beyond the written words, it presupposes a familiarity with a world of context). And two, that the way forward is not going big, it's going small. In practically every aspect of historical technology development, the vector for improvement was towards being able to do more with less.

These are not the only problem GPT has. Other potholes include its blindness to text structure. This includes logic, and also why it has this rambling incoherent style.

So to me, GPT-3 never really crosses over from an academic curio to something that allows us to do something we've not done before. Can it somehow be used to sell EVEN MORE ads? Possibly. Hoorah.

But it's not really the milestone the Open AI would like you to believe.

(However, if anyone can package Elon Musk's capacity to spin...)


I still don't see or have read about a useful use case for GPT-3. Maybe for procedurally generated chitchat between NPC in video games or an advanced Lorem ipsum generator but that's about it.


Yes. People were literally claiming to be able to write code using English language descriptions. This guy on Twitter generated a ton of hype with these cherry picked examples...even from people who should know better like Eliezer Yudkowsky.

https://twitter.com/sharifshameem/status/1284095222939451393...


Also interesting to note that the hype came a while after the release of GPT-3 (the tweet was about a month after), and that many of the examples could have possibly been generated from GPT-2 (which didn't get as much hype). Put another way, you could show ppl GPT-2 examples from a year ago and claim they were from GPT-3, and I don't think many would know the difference.

I do think GPT-3 has shown improvement, and it's a step forward, but it probably tells more about how humans interpret rather than how AI might work. Wrote about it more in the link below:

https://avoidboringpeople.substack.com/p/doctor-gpt-3


GPT-3 says: "Well, it's not intended for that. It's more for meaningful conversations like you have with people in real life. Look, use cases are not my area of expertise, you'll have to talk to the project lead about that."

GPT-3's use case is fun.


Auto-formatting punctuation - or syntax. You can show it a few examples and it will insert punctuation into raw words coming in from speech recognition, or convert badly formatted Python into well-formatted Python. It can correct syntax errors in C++ (!).

More generally, it can convert text from a simple, unstructured form into more complex, more structured form, with just a few example transforms. It's astonishingly good at it, but I haven't seen much on the public net about it. I don't know why people aren't exploring it more tbh.


Now that it is a captive technology, the question becomes why should anyone want to explore it other than Microsoft.


It's not captive. It's quite replicable (and impossible to monopolise), it's just expensive.


I’m the founder of copy.ai, a GPT-3 app that generates short form marketing copy. We have paying users already who find it helpful.


A scathing critique of both GPT-3 and marketing copy


This is unjustly harsh. At least it's got tangible benefits and customers without harming a soul. GPT-3 can be and is likely already weaponised, it's nice to see anyone out there using it for humble things rather than straight-up digital warfare.

I'm hoping we can finish some of Tolkien's unpublished books (Please give me beta access OpenAI heh)


> This is unjustly harsh. At least it's got tangible benefits and customers without harming a soul. GPT-3 can be and is likely already weaponised, it's nice to see anyone out there using it for humble things rather than straight-up digital warfare.

> I'm hoping we can finish some of Tolkien's unpublished books (Please give me beta access OpenAI heh)

> [...] GPT-3 can be and is likely already weaponised, it's nice to see anyone out there using it for humble things rather than straight-up digital warfare.

Well, not that I necessarily agree with the decision (I'm undecided), but one point in favor of only giving access to the model via an API rather than releasing it, is that the use can be monitored and hopefully "weaponization" will be either noticed or detected[0].

It would be pretty interesting if OpenAI eventually provided an external "ML API abuse monitoring" service, but one problem I haven't figured out a solution to is that when providing such a service, the goals of "reducing harmful use" and of "slowing the arms race"[1] (which are both valid goals) are somewhat opposed. I'm still thinking about that.[2]

[0] Using ML to detect harmful uses of an ML API would be a quite interesting research topic, and quite in line with OpenAI's stated mission, but gathering the necessary data set for training purposes may require allowing harmful use of the API in the first place (though I have thought of a few ways to mitigate those risks and limit the actual harm).

[1] http://xkcd.com/810/ is amusing, but misses the point. Spammers aren't just attempting to evade filters, but also trying to accomplish their goal, so inadvertently training a bot capable of writing an apparently constructive comment that successfully sneaks a spam link through to be clicked on and/or indexed isn't exactly a win for the good guys.

[2] I wouldn't be surprised in the least if a GAN-like social deception arms-race (especially since the same networks have to serve as both generator and discriminator) was the proximate cause of the Upper Paleolithic Revolution.


Oof, such an awesome technology whose use is going for something like marketing copy. I'm sure it pays the bills for you, but is kind of gross overall.


It's the AT-5000 auto-dialer all over again.


In the future, the AI will persuade who to vote for in coming elections.


Yes, if only they put their efforts towards generating low effort, self-righteous posts on message boards.

Marketing isn't evil, and it isn't the end of the world.


Of course marketing isn't evil, and I didn't say that. Marketing can be quite awesome and impressively done. However, you know AI/GPT-3-based writing and helping of marketing copy will lead to lowest common denominator scaling of marketing copy to everyone with tweaks for the specifically targeted person. My opinion is that that's as gross as ML people at Google optimizing for every ad click with slight tweaks to maximize revenue at any cost. Gross.

As to your dig, my conscience is clear; I made a conscious effort to avoid such ends in my professional life, sometimes to my short term detriment. My personal time is mine to do with what I please, even if you consider some of that time to be used in a self righteous or low effort way. It's not self righteous to point out bad things.


In the blog post by OpenAI it doesn't mention it's an exclusive deal: https://openai.com/blog/openai-licenses-gpt-3-technology-to-...


Given the frequency, HN needs an automatic OpenAI submission checker that changes all titles to read 'ClosedAI' instead.

How far the original initiative has fallen; how smug it now looks.


Perhaps someone can write a bookmarklet or browser plugin with the same effect.


open != exclusive


I think by “democratise ai “ they meant “commercialise ai” they should fix that typo in their mission statement.


MS AI !


It's part of their 2019 agreement for taking $1B in funding

>Multiyear partnership ... and an investment of $1 billion from Microsoft...

> Microsoft and OpenAI will jointly build new Azure AI supercomputing technologies

> OpenAI will port its services to run on Microsoft Azure, which it will use to create new AI technologies and deliver on the promise of artificial general intelligence

>> Microsoft will become OpenAI’s preferred partner for commercializing new AI technologies

https://news.microsoft.com/2019/07/22/openai-forms-exclusive...


Financially it kinda makes sense. But isn’t this contrary to the main reason for OpenAI’s existence - to be... open?


OpenAI: This technology can be misused hence we'll not release it to the public.

Also OpenAI: Exclusive access to Microsoft which eventually will release it to public but after paying to OpenAI.

Elon is noone more than a typical corporate who slips away from his ethical commitments at the first opportunity of making money.


> Elon is noone more than a typical corporate who slips away from his ethical commitments at the first opportunity of making money.

What does it have to do with Elon Musk? The article does not mention him as the cause of OpenAI's actions, and he himself has said that he does not have control of the company: "I have no control & only very limited insight into OpenAI."[1]

1. https://twitter.com/elonmusk/status/1229546206948462597


Fair point. Then his name shouldn't either be used to "market" it and gain public rapport. Because the impression it gives to public is him being in control all while there's no truth in it and public falsely believing in OpenAI's commitments assuming Elon is associated with it.

My question: Why has Elon not put his departure news on twitter as his own tweet rather than just replying to someone?

Here are a few pages on openai's website that still carry his name.

1. https://openai.com/blog/authors/elon/

2. "Additionally, Elon Musk will depart the OpenAI Board but will continue to donate and advise the organization" (https://openai.com/blog/openai-supporters/)

3. https://openai.com/blog/openai-technical-goals/

4. https://openai.com/blog/introducing-openai/


The full tweet mentions Dario, too. The full tweet text:

> I have no control & only very limited insight into OpenAI. Confidence in Dario for safety is not high.

Dario is the research director by the way. This tweet says a lot, in my opinion.


One more comment from Elon:

> This does seem like the opposite of open. OpenAI is essentially captured by Microsoft. [0]

https://twitter.com/elonmusk/status/1309052632850468864


Elon is not involved with OpenAI anymore: https://twitter.com/elonmusk/status/1229546206948462597?s=21


Lets just build another and be done with it.

It’s expensive to make with current parameters.

Can anyone weigh in on the following as a means of lowering compute cost.

https://www.infoq.com/news/2020/10/training-exceeds-gpt3/


This thing is pure clickbait. It's better than GPT-3 on language representation tasks where GPT-3 was not very good to start with. GPT-3 is a language generation model, and on language generation tasks, GPT-3 is absolutely better than anything else right now. In PET's author own words: "GPT-3 certainly is much better than our approach at generating long sequences of text (e.g., summarization or machine translation)"

There are things that can be done to have something similar to GPT-3 in performance at a lower cost, but the reduction won't be anywhere near 99.9%.


thank you for the feedback.

id love to see the this be open sourced.

surely someone is looking to recreate this.

is anyone doing so to your knowledge? current barrier as far as i understand is that it costs an estimated 4.6MM USD in compute to recreate.


> is anyone doing so to your knowledge? current barrier as far as i understand is that it costs an estimated 4.6MM USD in compute to recreate.

That I know of, no one is looking into doing this in the open source community no. I think it's simply too hard and costly.

As I said in another comment, even if an open source org. managed to raise 4.6M$, they still wouldn't be able to do it as the infrastructure needed to train a model of that size is not available publicly on any cloud.

The cluster Microsoft built for OpenAI is about 285k CPUs and 10k V100 GPUs, this is absolutly huge. From GPT-3's paper we know the training required 3640 PFLOPs/days of compute. A single V100 GPU is about 14 TFLOPs in half precision, so if you had 75 of those GPUs (which is already massive by open source standard, that's about 5 DGX-2 nodes), it would take about 10 years of training (by that same calculation, it still took about a month on their hypercluster if they used all the GPUs). And that is probably an optimistic estimate since I am using Nvidia's marketing slide to evaluate a V100 FLOPs.

Even if somehow an open source organisation was able to get their hand on a cluster half the size of the one OpenAI used, by the time they figure out all the data scraping/cleaning part, found the correct hyper parameters by experimenting on smaller models, and developed the many other things and tricks you need to do to train those models (optimizers, correct checkpointing/resuming strategy etc.) I believe one of OpenAI/Microsoft/Google/Nvidia would already have (or be very close to) announcing the next bigger model, and then it's back to square 1.

I think it's just too big a project (with too little usefuleness) to justify undertaking if you are not a massive organization.

The closest open source org that could pull this off that I can think of would be HuggingFace. They have the skill and descend funding, and they have the incentive (I believe they also want to enter that NLP API space), but they don't have access to that kind of compute infrastructure (that I know of). They also have been open source so far, but nothing guarantees that would share the checkpoints of such a model should they train it.


Great points. Thank you.

Is something like this capable of being run as a distributed computing project, akin to something like folding at home?

Are there any obstacles that would make recreating a GPT-3 via a distributed system impossible...other than GPT-4 coming out before completion of such a system?


I'm not an expert in distributed computing so take what I say with a grain of salt, but I believe it would be really hard.

The premise of all the @home projects is that each compute unit can do some relatively large amount of work in isolation and then communicate the result back to a central server. If one of the client disconnects in the middle of it's computation, it can just be redistributed to another client. And it doesn't really matter if a client is very slow. For the same reason network latency barely matters in those scenarios.

On the other hand, to train GPT-3 the model is split into many GPUs at every level, even a simple matrix multiplication goes accross the network, and there is no way to continue the computation until every GPU has communicated their result. So the issue here is that if a compute unit is very slow, then all the other ones* will block waiting for it, so your total cluster speed is defined by the slowest unit + slowest network time. And if a unit of compute disconnects (which happens constantly in @home scenarios), then its work need to be attributed to another unit and re-executed, while every other unit is idle waiting for that to happen.

I'm sure there are smarter ways to split such a model than just trying to replicate what we currently do on big compute clusters, but it's definitly a hard problem. Finding better ways to do distributed training with less communication between units is an active area of research

*: Not exactly all the other ones, because you could divide your cluster into chunks, each chunk handling one batch of data. So the chunks would be independents to a certain degree. But the size of the batch has an impact on the final performance of the model (because of batch normalization operation) so you still want it to be quite big, which requires many clients for a single chunk.


Very interesting thank you.


I think your first sentence is an answer to every thread of this nature. The rarity of this sentiment and people's sentiment towards it are self-evident.

Training a language model similar to GPT-3 in capability, but cheaper (and smaller parameter-count-wise) is entirely possible for a committed team, if one applies more modern architecture and finds cheaper training infrastructure. For example a 1080ti cluster or a few graphcore servers. Also one does not have to start from scratch, the initial model can be distilled from a publicly available snapshot of one of the larger language models.

In fact, as we are talking, several teams are racing towards this goal. I see this as the main reason for prevalence of non-constructive comments in such threads - everybody with constructive attitude has found a team or founded their own.

If you are interested in these questions, get in touch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: