Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I didn't submit this. I didn't know anyone was submitting it. Or I would have written a post about why it is different than other search engines. Better is still up for grabs because it is new.

Samuru doesn't use link authority, it analyzes pages and matches what you queried to the types of pages and picks the best matches.

Let me give you an example.

You search for "How to Make cupcakes" Google says give me the pages that have the most inbound linkes (over simplification) that contain all those words. The winner is Brandon's Cupcakes (not really but play along for a minute) because it says, "We know how to make the best cupcakes, because we have been doing it for 25 years"

That is not a useful result. Samuru on the other hand says "how to make cupcakes is a search for instructions" and it looks for pages that match the words, and are written as instructions.

We weigh other factors, like is there an author associated with the article. Do they routinely write about the topic?

We do this for reviews, products and other things as well.

To be a full replacement for Google we need Driving directions, and image search and a lot of things. But in order to do all the other things we are doing we needed a search engine. (related content, analysis, speed testing, building a corpus of words)

Responses get better if you search something someone else has searched or do a second search 30 seconds later. This is because we haven't deep indexed the entire Internet yet, and so we don't have all the deep data.



Re: your portrayal of how Google works...An "over simplification"? It's just plain wrong. Google has, for quite awhile, not depended on sites containing all the words of a query...and natural language processing plays a huge part in analyzing intent of a query.

I applaud this ambitious project but I'm skeptical you'll achieve what you aim for if you're way off the mark in understanding how Google is so successful...I mean, to even talk of replacing Google at this stage -- and saying it's just a matter of providing rich snippets and other ancillary features as if that was your engine's main deficiency compared to Google -- is quite bold and a little cart before horse, IMO

-

Edit: an example...I did a search for my own name, something I do habitually because I'm locked in an eternal struggle with a younger, better looking, more talented namesake for the top Google result. However, your search engine returns neither me nor my singing rival as the top result...instead you return the domain that is my first and last name with a hyphen, which is exactly the superficial result that Google was designed to avoid.


"I'm skeptical you'll achieve what you aim for if you're way off the mark in understanding how Google is so successful."

Considering his comment drew out Matt Cutts, I'd bet he understands Google pretty well.


Right, but the success of a technical product is often based on results that can be tested and verified, and less on arguments through authority, though if Cutts is willing to say something like. "Wow, the OP has created something that surpasses Google in [whatever metric]", I'd admit I'd take his word on it.

But the OP's claim stands on its own and makes assertions that can easily be verified. Are you arguing that Google's search engine is as simple and literal as the OP claims?


Google Brandon Wirtz Greatest Living American. I am pretty good at understanding how Google works.

While Google gets things right with out all of our language stuff a lot of that is because they have user data about what people are clicking on and which things they come back after reading.

That data means if they can get the stuff on to the front page they can "crowd source" the rest.

We don't have that user base for a feed back loop. We have to get our results entirely based on software.


Not that it's as simple and literal, just that he purposely oversimplified it for the purposes of his comment.

It's not like he was writing up this big detailed blog post.


So you think Google knows if a piece of content is instructional?

Or know that it is a Review? (not just has a rating)

And you have to have all the words or a synonym of the words. But more importantly Google doesn't know what kind of content something is. Or what questions the content answers. Our system knows "this document answers what do aardvarks eat"


I will happily concede that you and your colleagues know more about search than I will ever know, and so I find it strange that we're having this argument...your perception of Google's limitations seems so far off that it's as if you've mistakenly referred to them instead of AltaVista.

To answer your questions, yes and yes, Google can derive the meaning of my search without relying on literal interpretation of the search terms. In fact, Google can return what I want even if I deliberately spell every word in the query incorrectly:

"couk besr ribz" instead of "cook best ribs" brings up recipes of how to do good ribs:

https://www.google.com/search?q=couk+besr+ribz

I'm willing to take you at your word, that there is a better way to interpret context and meaning from a search query than however Google does it...but if that is the entire raison d'etre for your search engine, can you come up with at least a few case examples where this is the case? The "cupcakes" you've contrived is clearly hypothetical (and not at all close to what happens in reality), and the ones that I've tried don't seem to show any improvement on human-friendly results. Which is not to say that samuru is a bad product...what you claim to do is incredibly difficult and is exactly the feature that makes Google such a useful, ubiquitous engine...I would love to be surprised but I'm skeptical that a new engine with a fraction of Google's processing power, nevermind the resources for test engineering and algorithm design, can compete with Google here...this isn't a "well, why hasn't anyone ranked search queries in this way before?" in the same way that PageRank/BackRub was 15 years ago...Search engines have been analyzing queries for intent, and their shortcomings in this area are due to it being a very hard problem, and not for lack of desire.


Your Google link gives me "Searching for couk best ribs".

The 10 results give me:

2 hits for boat related stuff, with no food.

4 hits for restaurants, with no recipes.

1 hit about cervical ribs.

And then, in positions 5, 8, and 9 there are recipes.

So, even with all the smart people and computing power at Google this is still a very hard problem.

It's great to have different people trying different approaches.


That's strange, I get links from amazingribs.com and food.com, which is more or less what I expect

http://www.amazingribs.com/recipes/porknography/best_BBQ_rib...

Either way, it's a deliberately contrived example to show that Google was more sophisticated than the GP indicates...I have no comment on whether amazingribs.com really does have the best ribs preparation tips


Hover over the Samuru Logo on the results. We will tell you a bit about what kind of search we think you are doing.

http://www.samuru.com/?q=python+list+comprehension

While we get the same top result for this, we also show that WikiHow.com linked to the Python Docs. So you can see closely related articles grouped together.

This is useful for things like http://www.samuru.com/?q=samsung+apple+patent+press+release

But Google requires the words or the synonyms to be on the results page. So do we, but we know that a search for http://www.samuru.com/?q=cook+bbq+ribs is not just about how to make BBQ Ribs, but to make the best BBQ ribs, because this is a subjective topic, I like honey BBQ, someone else likes Mesquite. That factors in to our results.


I repeated the experiment and got different results than yours. (woot, science!)

http://imgur.com/ghYKH7K shows my results. I can't reproduce the ones you describe.

Regional variations might account for the differences. Do you live in a place where people don't eat ribs? (If so, please accept my sincere condolences)


>So you think Google knows if a piece of content is instructional?

Yes.

>Or know that it is a Review? (not just has a rating)

Yes.

>But more importantly Google doesn't know what kind of content something is.

You don't think Google knows how to classify content?


Then you think wrong.

Google can't tell that something is instructions. That's not something they do. It may know that ehow has a lot of pages with How To on them, but it doesn't know if those are pages with step by steps for how to do something, nor does it know that Rotten Tomatoes has pages with Opinions and Points, and conclusions that make up a review.


For the sake of realism only: any basic supervised machine learning algorithm does that with proper labeled dataset. I have built many. I can assure you this is so classic that Google has it, that many other companies have it. And there are much better and accuratd solutions in place for combining all signals of this kind.


Well their move to semantic technologies e.g. Schema.org is a step in the direction of understanding these things better. Like if you mark up the page with http://schema.org/Review wouldn't you agree that they know it's a review?

Are you using semantic web btw?


No. They know you tagged it a review.

I could tag it review and have it be a sales copy page for a product.

Our stuff knows that a review expresses an opinion, backs that up with facts, and makes a conclusion.


And are you using Semantics? Going to Semtechbiz in June? If so, I'll buy you a beer ;)


But more importantly Google doesn't know what kind of content something is. Or what questions the content answers.

I'm not sure I agree with this. Google may not know it explicitly but it can effectively know it, perhaps in some cases even better than the average human would if given the same search keys. It is the difference between knowing what a word means by the way people actually use it versus by looking it up in the dictionary.

Google essentially crowdsources its results, taking advantage of the fact that people entering similar queries probably have similar intentions. If your users are only using keywords, with little regard for word order, I don't see how you can do better than Google on average. You may do better for obscure queries where the best search result can not be easily inferred from the keywords present, but how pages are like this? Furthermore, if there are a set of keywords where semantic analysis suggests page A is the best result but the page that most people actually click on is page B, which do you return first? What if your index doesn't even have B? This will be challenge you will face when trying to do better than Google on average. Nevertheless, I applaud your work and will definitely keep Samuru at the ready for the queries that Google struggles on.


Who said we don't use word order?

Who said we don't use similar search to infer intent?

We can't really use the click throughs until we have users. We need feed back to improve results. But this also powers the related search in our TLDR Products ( http://www.tldrstuff.com ) and those have to work with much more abstract queries, because they aren't user queries they are generated queries.


I'm not saying that you don't use word order. I think you must to some extent if you are doing a comprehensive semantic analysis of the keywords and websites. I'm saying that your users may not use correct word order because they are lazy and because they are used to the behavior of search engines that don't require it but still give okay results.

Perhaps you have solved this problem, but I just don't see how you can offer better results than Google on average (without having a similar sized index) when users are just throwing together a bunch of words related to what they are looking for. It seems to me that if we want to take advantage of systems like yours for search and if we want to get better results than Google, we need to change users' behavior; they need to learn to give more precise queries.


I think technology should strive for Zero Learning Curve. If I type "Chicken Chord On Blue" it should figure out that I probably need to know that I spelled it wrong, what it is, and how to make it. A user shouldn't have to know the answer they are looking for in advance.


We have that issue because it turns out many brand pages don't have any content. We are still balancing the domain bonus, but think how many home pages for brands have no text. No text means we have nothing to analyze.


That's why you analyze inbound links, too.


When you are building a technology, you have to isolate and test. We use indicators that are harder to game than inbound links like traffic. But it is a balancing act. We have just shy of 100 score factors we can tweak, and getting them right takes a bit of time.


Hey Brandon, congrats on launching Samuru! I'll be curious what you think of running a search engine after being an SEO for so many years.


I am enjoying it. We went with the approach of how can we make this impossible for Brandon Wirtz to game. How can we make this about the content more than a popularity contest.

Now that we are both in the business of stopping spam we should grab lunch sometime.


I'll be up in Seattle in June for the SMX Advanced conference. Not sure if you'll be at SMX?


Maybe. I'm in Phoenix because that's where Stremor is based, but I'll hit you up next time I am in Mountain View


Matt, any insight for people wanting to build new search areas out?


Quick comment on the interface. Looks like the initial page is optimized for 1024x768. I'm on a netbook at 1024x600 (even less so since I'm not at fullscreen). While I realize that I'm in the minority, it looks like the issue is everything is position absolutely. The only reason that the bottom of the page is cut off is because there is a bunch of empty space between the top of the page and the logo: http://imgur.com/vDFwY8b

I realize that this is a bit of a nitpick, but I felt the need to mention it.


This is really strange. I just searched for "how to ride a bike" and the first links from Samuru are completely useless whereas the first link from Google is exactly what I wanted, instructions from wikihow. How do you explain that?


Pre-Google, this is pretty much how search engines worked... by analysing page content and weighting that rather than the network of links around the page.

Having just played with it, it feels both backwards and refreshing to go back to that. The results are different enough to feel good for the terms I used.


Other features I should have mentioned: Threaded results. If a result is cited by other results, they will be grouped so that you can see the conversation across sites.

Better Social Media integration. We do Facebook, Twitter, Google Plus not just Google Plus for showing authors.

Voice Input if you are on Chrome 25 or higher.

Results are returned with Summaries not Snippets.

With that I am falling asleep. I have enjoyed answering questions on this an the https://qht.co/item?id=5579336 thread but 5 hours of it has worn me out. If you leave comments I'll promise to get back to them.


So how exactly do you get better results 30 seconds later? How do you index more relevant pages? Do you... Google it?


We have a shallow index of pages. We do a Deep index only if a page then shows up in search results.


thanks


Hi, congratulations, I like really Samuru also if it's not perfect. I wanted ask you two questions :

1) Are you sure that giving a "bonus" to domains containing a part of a query is a good idea ? I understand the reason behind that, and know that you need time to turn off this "bonus" but waiting that moment are you really sure that is a good idea ?

When I type "How to rank well on Google" the first results is www.google.com => http://www.samuru.com/?q=How+to+rank+well+in+Google

Instead from the third positions the web pages seems to be great.

2) how works the search suggest ?

I m a french user and in our language we have a lot of accents like "é è ù à". While typing a search query many people do not use them. When i correctly type a query with the accents, Samuru suggests the same query but without accents, this is wrong and that's why I m asking me about the provenience of data used by the search engine to provide these queries suggests.

I really wish you to accomplish this project.


sorry, I forgeted to say you that we're talking of you, not only in america but also in france http://www.lightonseo.com/moteurs-de-recherche/624-0423-samu... I will be very happy if you can link to this page within the news list of Stremor website.


I just searched for How to make cupcakes and the first three results were instructions of how to make a cupcake, including a video on youtube.

P.S.: I mistakenly typed HOT to make cupcakes


it is now very hard to get many Google results that contains all your search terms which is why I start to dislike it... for example it gives you "synonyms" or the terms are completly missing.... I sure will give yours a try


I am more curious about traffic. does your search engines provide traffic like google to site?


We will as more people use us. We think that because we provide a summary of your page rather than a snippet that we drive more traffic if you are deserving of it. Snippets don't really "sell" your content or your writing style. Summaries do. We think that by giving people more insight in to what to expect rather than part of a few sentences with the keywords you searched for in them, that we help users make better decisions about what to read.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: