More

orost · on April 17, 2024

That's not the model this post is about. You used the base model, not trained for tasks. (The instruct model is probably not on ollama yet.)

mysteria · on April 17, 2024

Yeah this is exactly what happens when you ask a base model a question. It'll just attempt to continue what you already wrote based off its training set, so if you say have it continue a story you've written it may wrap up the story and then ask you to subscribe for part 2, followed by a bunch of social media comments with reviews.

Sohcahtoa82 · on April 18, 2024

It can be fun, though, to prompt a text completion with something like "I'm thinking about" and just seeing what random thing it completes it with.

byteknight · on April 17, 2024

I absolutely did not:

ollama run mixtral:8x22b

EDIT: I like how you ninja-editted your comment ;)

orost · on April 17, 2024

Considering "mixtral:8x22b" on ollama was last updated yesterday, and Mixtral-8x22B-Instruct-v0.1 (the topic of this post) was released about 2 hours ago, they are not the same model.

byteknight · on April 17, 2024

Are we looking at the same page?

https://imgur.com/a/y6XfpBl

And even the direct tag page: https://ollama.com/library/mixtral:8x22b shows 40-something minutes ago: https://imgur.com/a/WNhv70B

orost · on April 17, 2024

Let me clarify.

Mixtral-8x22B-v0.1 was released a couple days ago. The "mixtral:8x22b" tag on ollama currently refers to it, so it's what you got when you did "ollama run mixtral:8x22b". It's a base model only capable of text completion, not any other tasks, which is why you got a terrible result when you gave it instructions.

Mixtral-8x22B-Instruct-v0.1 is an instruction-following model based on Mixtral-8x22B-v0.1. It was released two hours ago and it's what this post is about.

(The last updated 44 minutes ago refers to the entire "mixtral" collection.)

gliptic · on April 17, 2024

And where does it say that's the instruct model?

belter · on April 17, 2024

I get:

ollama run mixtral:8x22b

Error: exception create_tensor: tensor 'blk.0.ffn_gate.0.weight' not found

Me1000 · on April 17, 2024

You need to update ollama to 0.1.32.

belter · on April 17, 2024

Thanks. That did it.

orost · on March 31, 2024

Mistral 7B Instruct v0.2 and Mistral 7B v0.2 are different models. Judging by the title, I suspect OP meant to post about the latter, which was released a few days ago, but accidentally linked to the former instead.

CharlesW · on March 31, 2024

Oh! That makes sense, thank you. https://www.marktechpost.com/2024/03/31/mistral-ai-releases-...

orost · on Feb 4, 2024

An air-breathing jet engine doesn't need to carry oxidizer, which in a rocket is most of the propellant weight. It also has access to unlimited reaction mass, so it can be much more energy-efficient in producing thrust (it is more efficient to produce thrust by accelerating a lot of mass by a little, than by accelerating a little mass by a lot, but a rocket can't take advantage of this because it would need to carry all that extra mass. A plane can use ambient air for this purpose)

This all adds up to a plane needing to carry many times less mass to gain the same altitude and speed as a rocket, at least within relatively dense atmosphere.

foota · on Feb 5, 2024

Is there a reason that there isn't a stage 0 to rockets that takes advantage of these properties?

orost · on Feb 6, 2024

A rocket on a typical orbital launch profile spends less than 60 seconds in air dense enough for jet engines to have good performance, so there is little to gain.

Pegasus is an orbital rocket launched from an aircraft, but it doesn't exactly impress with performance or cost-effectiveness. Just doesn't make much sense to operate a huge aircraft and design your system around it just to improve on the least important 10% of the flight.

orost · on Dec 22, 2023

You can partially offload with some backends (e.g. llama.cpp and derivatives) but speed gains from that don't come in until it's mostly offloaded. I have 8GB VRAM and it's not enough to get any boost on mixtral in Q8. 16GB might do better or it might not.

The speed is quite good even on CPU only though, I get 3.5 tokens per second with 6 cores and DDR5-6000. For comparison llama2-70B is less than 1 t/s on the same hardware in Q4. And, subjectively, Mixtral performs better.

orost · on Nov 4, 2023

A reactor that has never been turned on isn't a significant radiation hazard. It's the fission products that are hazardous, not the fuel, if it's never gone critical there are no fission products yet.

adrianN · on Nov 4, 2023

It’sa sufficiently big radiation hazard that I wouldn’t want to be under it.

wizardforhire · on Nov 4, 2023

How much under? I’m guessing the real fear is contact, which then begs the question of dose over time.

Because as I’m sure you know and can see where I’m going with this, you’re already living under an enormous amount of lethal radiation, you and everyone else has been for their entire lives… its called the van allen belts.

orost · on Oct 15, 2023

The bazarek is fun but in reality even less relevant that this post makes it out to be. Since people with real information cannot prove it and it takes zero effort to post fakes the bazarekposts are not any more meaningful than random guesses. Arguing about them is just a pastime for people waiting for polls to close.

orost · on Sept 6, 2023

Preparations for pad repairs and upgrades were well underway before the first flight - the question was not whether they'd be necessary, but how much and how soon. In particular if I remember correctly manufacturing of the steel plating that now forms the pad started all the way back in January.

orost · on Aug 10, 2023

Anything with 64GB of memory will run a quantized 70B model. What else you need depends on what is acceptable speed for you. With a decent CPU but without any GPU assistance, expect output on the order of 1 token per second, and excruciatingly slow prompt ingestion. Any decent Nvidia GPU will dramatically speed up ingestion, but for fast generation, you need 48GB VRAM to fit the entire model. That means 2x RTX 3090 or better. That should generate faster than you can read.

Edit: the above is about PC. Macs are much faster at CPU generation, but not nearly as fast as big GPUs, and their ingestion is still slow.

quickthrower2 · on Aug 10, 2023

Vastai would rent you those for about $.50 an hour so gives you an idea of what it costs. Assuming the GPUs memory can be stacked

tstrimple · on Aug 10, 2023

Do these large models need the equivalent of SLI to take advantage of multiple GPU? Nvidia removed SLI from consumer cards a few years ago so I’m curious whether it’s even an option these days.

sterlind · on Aug 10, 2023

SLI isn't used at all for CUDA. if you meant NVLink, it's apparently not useful at small scales - I think the PCIe lanes are enough.

ipsum2 · on Aug 18, 2023

This is wrong, NVLink is crucial for tensor parallelism in models for training and in large (>40B param) models for inference.

orost · on July 23, 2023

The simulation is just so fake, almost everything that goes on is just decorative.

There is a budget, but after the first 30 minutes you'll always be running an enormous surplus without trying.

Citizen commute to work, but if they can't get there, the workplaces will continue to work, with some trivial penalty to efficiency.

There is traffic simulation, but if a jam forms, vehicles will start vanishing to unblock the road.

You can build public transport, but it doesn't matter if it's efficient, because the city's entire population can be standing at bus stops waiting forever with seemingly no ill effect.

Citizens will use parking spots if they're available, but if they aren't, they'll just disappear their car and reappear it later.

Zoned buildings get built and upgraded autonomously, but what gets built doesn't depend on economic factors, just on how many upgrade points are accrued from nearby services and attractions.

There is a large number of special buildings of various types that can be unlocked and built, but they all count as a tourist attraction and don't perform their actual function, they're effectively statues.

Cities Skylines is a bizarre un-game that has all the UX and presentation of a city simulator without any actual simulation.

Sohcahtoa82 · on July 24, 2023

I think a lot of the issues are symptoms of a greater design issue that I don't think is solvable:

Wanting traffic to visually flow in real-time speeds without having a 24-hour (real-time) day cycle.

If you want 1 day to be 60 minutes, and someone's commute takes 10 minutes, that means their car will be on the road for 33% of the game time, when in reality, a 10 minute commute would be 1.375% of a day's time. The result is the overall, there's FAR more traffic on the roads in C:S than is realistic.

Industrial especially just spawns far too much traffic, far more than is realistic. If you think 1 factory produces 5 semi-trucks worth of goods per day, but they spawn all those trucks in a 1-hour period and have them drive at real-time speeds, you end up producing 24x as many trucks as you should.

But as I said above, I don't think the problem is solvable and still make a fun game. You either have to make an in-game day take 24 hours, or make the cars animate faster.

Qwertious · on July 23, 2023

It sounds like they tried to remove failure modes, when they should instead have figured out a way to forgive failure modes after they occur. That way, players could still fail but it wouldn't result in a death spiral that wrecks the last hour+ of citybuilding.

orost · on July 1, 2023

Yes, many, huggingface is full of chat-tuned LLaMA derivatives that are supposed to replicate its performance, and tools like text-generation-webui or kobold.cpp can be used to run them with chat-style UX.

But for most tasks none of them come within a mile of GPT-3.5, or within a parsec of GPT-4.