Hacker Timesnew | past | comments | ask | show | jobs | submit | orost's commentslogin

That's not the model this post is about. You used the base model, not trained for tasks. (The instruct model is probably not on ollama yet.)


Yeah this is exactly what happens when you ask a base model a question. It'll just attempt to continue what you already wrote based off its training set, so if you say have it continue a story you've written it may wrap up the story and then ask you to subscribe for part 2, followed by a bunch of social media comments with reviews.


It can be fun, though, to prompt a text completion with something like "I'm thinking about" and just seeing what random thing it completes it with.


I absolutely did not:

ollama run mixtral:8x22b

EDIT: I like how you ninja-editted your comment ;)


Considering "mixtral:8x22b" on ollama was last updated yesterday, and Mixtral-8x22B-Instruct-v0.1 (the topic of this post) was released about 2 hours ago, they are not the same model.


Are we looking at the same page?

https://imgur.com/a/y6XfpBl

And even the direct tag page: https://ollama.com/library/mixtral:8x22b shows 40-something minutes ago: https://imgur.com/a/WNhv70B


Let me clarify.

Mixtral-8x22B-v0.1 was released a couple days ago. The "mixtral:8x22b" tag on ollama currently refers to it, so it's what you got when you did "ollama run mixtral:8x22b". It's a base model only capable of text completion, not any other tasks, which is why you got a terrible result when you gave it instructions.

Mixtral-8x22B-Instruct-v0.1 is an instruction-following model based on Mixtral-8x22B-v0.1. It was released two hours ago and it's what this post is about.

(The last updated 44 minutes ago refers to the entire "mixtral" collection.)


And where does it say that's the instruct model?


I get:

ollama run mixtral:8x22b

Error: exception create_tensor: tensor 'blk.0.ffn_gate.0.weight' not found


You need to update ollama to 0.1.32.


Thanks. That did it.


Mistral 7B Instruct v0.2 and Mistral 7B v0.2 are different models. Judging by the title, I suspect OP meant to post about the latter, which was released a few days ago, but accidentally linked to the former instead.



An air-breathing jet engine doesn't need to carry oxidizer, which in a rocket is most of the propellant weight. It also has access to unlimited reaction mass, so it can be much more energy-efficient in producing thrust (it is more efficient to produce thrust by accelerating a lot of mass by a little, than by accelerating a little mass by a lot, but a rocket can't take advantage of this because it would need to carry all that extra mass. A plane can use ambient air for this purpose)

This all adds up to a plane needing to carry many times less mass to gain the same altitude and speed as a rocket, at least within relatively dense atmosphere.


Is there a reason that there isn't a stage 0 to rockets that takes advantage of these properties?


A rocket on a typical orbital launch profile spends less than 60 seconds in air dense enough for jet engines to have good performance, so there is little to gain.

Pegasus is an orbital rocket launched from an aircraft, but it doesn't exactly impress with performance or cost-effectiveness. Just doesn't make much sense to operate a huge aircraft and design your system around it just to improve on the least important 10% of the flight.


You can partially offload with some backends (e.g. llama.cpp and derivatives) but speed gains from that don't come in until it's mostly offloaded. I have 8GB VRAM and it's not enough to get any boost on mixtral in Q8. 16GB might do better or it might not.

The speed is quite good even on CPU only though, I get 3.5 tokens per second with 6 cores and DDR5-6000. For comparison llama2-70B is less than 1 t/s on the same hardware in Q4. And, subjectively, Mixtral performs better.


A reactor that has never been turned on isn't a significant radiation hazard. It's the fission products that are hazardous, not the fuel, if it's never gone critical there are no fission products yet.


It’sa sufficiently big radiation hazard that I wouldn’t want to be under it.


How much under? I’m guessing the real fear is contact, which then begs the question of dose over time.

Because as I’m sure you know and can see where I’m going with this, you’re already living under an enormous amount of lethal radiation, you and everyone else has been for their entire lives… its called the van allen belts.


The bazarek is fun but in reality even less relevant that this post makes it out to be. Since people with real information cannot prove it and it takes zero effort to post fakes the bazarekposts are not any more meaningful than random guesses. Arguing about them is just a pastime for people waiting for polls to close.


Preparations for pad repairs and upgrades were well underway before the first flight - the question was not whether they'd be necessary, but how much and how soon. In particular if I remember correctly manufacturing of the steel plating that now forms the pad started all the way back in January.


Anything with 64GB of memory will run a quantized 70B model. What else you need depends on what is acceptable speed for you. With a decent CPU but without any GPU assistance, expect output on the order of 1 token per second, and excruciatingly slow prompt ingestion. Any decent Nvidia GPU will dramatically speed up ingestion, but for fast generation, you need 48GB VRAM to fit the entire model. That means 2x RTX 3090 or better. That should generate faster than you can read.

Edit: the above is about PC. Macs are much faster at CPU generation, but not nearly as fast as big GPUs, and their ingestion is still slow.


Vastai would rent you those for about $.50 an hour so gives you an idea of what it costs. Assuming the GPUs memory can be stacked


Do these large models need the equivalent of SLI to take advantage of multiple GPU? Nvidia removed SLI from consumer cards a few years ago so I’m curious whether it’s even an option these days.


SLI isn't used at all for CUDA. if you meant NVLink, it's apparently not useful at small scales - I think the PCIe lanes are enough.


This is wrong, NVLink is crucial for tensor parallelism in models for training and in large (>40B param) models for inference.


The simulation is just so fake, almost everything that goes on is just decorative.

There is a budget, but after the first 30 minutes you'll always be running an enormous surplus without trying.

Citizen commute to work, but if they can't get there, the workplaces will continue to work, with some trivial penalty to efficiency.

There is traffic simulation, but if a jam forms, vehicles will start vanishing to unblock the road.

You can build public transport, but it doesn't matter if it's efficient, because the city's entire population can be standing at bus stops waiting forever with seemingly no ill effect.

Citizens will use parking spots if they're available, but if they aren't, they'll just disappear their car and reappear it later.

Zoned buildings get built and upgraded autonomously, but what gets built doesn't depend on economic factors, just on how many upgrade points are accrued from nearby services and attractions.

There is a large number of special buildings of various types that can be unlocked and built, but they all count as a tourist attraction and don't perform their actual function, they're effectively statues.

Cities Skylines is a bizarre un-game that has all the UX and presentation of a city simulator without any actual simulation.


I think a lot of the issues are symptoms of a greater design issue that I don't think is solvable:

Wanting traffic to visually flow in real-time speeds without having a 24-hour (real-time) day cycle.

If you want 1 day to be 60 minutes, and someone's commute takes 10 minutes, that means their car will be on the road for 33% of the game time, when in reality, a 10 minute commute would be 1.375% of a day's time. The result is the overall, there's FAR more traffic on the roads in C:S than is realistic.

Industrial especially just spawns far too much traffic, far more than is realistic. If you think 1 factory produces 5 semi-trucks worth of goods per day, but they spawn all those trucks in a 1-hour period and have them drive at real-time speeds, you end up producing 24x as many trucks as you should.

But as I said above, I don't think the problem is solvable and still make a fun game. You either have to make an in-game day take 24 hours, or make the cars animate faster.


It sounds like they tried to remove failure modes, when they should instead have figured out a way to forgive failure modes after they occur. That way, players could still fail but it wouldn't result in a death spiral that wrecks the last hour+ of citybuilding.


Yes, many, huggingface is full of chat-tuned LLaMA derivatives that are supposed to replicate its performance, and tools like text-generation-webui or kobold.cpp can be used to run them with chat-style UX.

But for most tasks none of them come within a mile of GPT-3.5, or within a parsec of GPT-4.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: