Is the engineering/logistics behind Mechanical watches really
costlier than Oneplus phones ?
Yes and no, right?
Altogether, there are probably orders of magnitude more engineering involved in a mobile phone. But a lot of the "hard stuff" when it comes to a OnePlus phone is not done by them: they are not running their own chip foundries, designing their own silicon, researching their own battery tech, mining their own rare earth elements for the batteries, writing their own kernel from scratch, etc.
Which is not to say I personally find a Tissot to be worth $700, relative to other watches.
But! This stuff is all just "functional jewelry" anyway; even a Rolex doesn't keep time as well as a $15 quartz Casio. Could a Tissot provide $700 of happiness to somebody else, who isn't me? Maybe! Heck, I'm not even familiar with their whole catalog, perhaps they have a bunch of watches I'd love.
It's also worth noting you can get a perfectly nice mechanical watch from Seiko or Orient for $100-$500USD. For me, that starts to approach some kind of reasonable value. Not on purely functional terms, but as something I will wear and enjoy for hundreds and possibly thousands of hours? I can't justify it on logical terms, but there are worse ways to spend disposable income. Barring disaster a watch will also work for decades, which is not likely to provide much joy to the owner beyond five years or so.
(In fact if you don't mind buying from non-authorized dealers you can actually get a Tissot for cheap as well - I've seen them for not much over $100)
Chaining has its own benefits. But I think this doesn't fit the definition of "Pythonic". Again, "Pythonic" is highly debatable. But, You can always break down big chain of operations, into smaller chain using good variable naming in-between.
Many operations are implemented as iterator in python on list, like filter, groupby.
Looking at your code, its looks like you're not doing lazy computation. (Correct me if I wrong). This could be huge performance impact, depending upon use case of list.
I understand the unpythonic nature of Arrays may startle some hardcore pythonistas, but ability to chain functions was one of the main reasons why I wrote the package as I find nested function calls ugly and sometimes rather hard to decipher.
Regarding the perfomance, Arrays aren't meant to be super high performing but rather a simple way to manipulate sequences. For the best performance you should go with generic python, toolz or other.
I am with you on this. Personally, I would rather continue using Toolz (https://github.com/pytoolz/toolz), and contribute additional helper/utility methods to that library.
The whole point of some things being functions versus methods is that they are generic rather than specialized. The generic iterator protocol is probably the best feature about the Python language, and it's both a damn shame and bad design to not use it.
If you really wanted to make an improvement over built in lists, the thing to do would be to implement some kind of fully lazy "query planning" engine, like what Apache Spark has. Every method call registers a new method to be applied with the query planner, but does not execute it. Execution only occurs when you explicitly request it. That way you can effectively compile in efficient but readable code that takes multiple passes over the data into efficient operations internally that only make one pass, or at least fewer passes. This also naturally lends itself to parallelization/concurrency.
Dask does the lazy evaluation and query planning thing on numpy arrays and pandas dataframes, and can execute in parallel. It mimics most of their native interfaces which makes it a pretty easy drop-in.
> But, You can always break down big chain of operations, into smaller chain using good variable naming in-between.
I don't think so. Very frequently the intermediate values represent nothing in particular and naming them simply results in visual noise.
I think this is comparable to SQL or LINQ statements. Consider what those would look like if you had to name every intermediate values instead of being able to filter and group on-the-fly.
Of course you can make a mess out of those too, by building huge unreadable expressions, but that's also an extreme, similar to naming every intermediate step.
I'm working on an ANN plugin for Elasticsearch. All data is stored on disk, you automatically get horizontal/distributed scaling handled by ES, and you can combine ANN queries with Elasticsearch queries. http://elastiknn.klibisz.com/
It's currently not as fast as the in-memory alternatives. Though it's not a perfect apples/apples comparison. Data is stored on disk, it's a JVM implementation rather than C/C++, and it's optimized for single queries rather than bulk.
The Lucene implementation seems early and slow-moving. Seems they are trying to create new storage formats and use graph-based search methods. OpenDistro wrapped a C++ binary that also uses a graph-based method. It works quite well, but only for L2 similarity and comes with the operational burden of running a rather large sidecar process completely disjoint from the JVM.
The approach I've taken is to support five similarity functions (L1, L2, Angular, Jaccard, Hamming), support sparse and dense vectors, implement everything inside the JVM with no sidecar processes and no changes to Lucene, and to use hashing-based search methods (i.e. LSH). IMO the last point has a clear advantage over using graph-based methods, because the hashes are treated just like regular text tokens which is clearly the optimal access pattern in ES/Lucene. Of course it will likely lose to a C++ implementation in terms of raw speed because it's the JVM, but IMO that matters less than making the plugin trivial to run and scale.
I don't think there's a definitively better approach yet. It's an interesting problem and it'll be interesting to see what ends up working well.
MPEG-7 has been working on this for decades. ffmpeg has an implementation you can use: https://ffmpeg.org/ffmpeg-all.html#signature-1
While this does pairwise comparisons, it should be possible to simply index the resulting fingerprints using lucene and perform inverted index search - this should scale much better than this technique mentioned.
Could you take (embeddings of) frames from your query video and measure the pairwise similarity to frames from the reference videos, then rank by some summary function of that list of similarity scores?
I'm not asking about the structure or how it's organized. I mean... is the filesystem in a file or... how?
Background: I mostly do embedded stuff so at a glance I would have expected low level primitives (like, HW interactions, registers and stuff) but I see none. So maybe, my expectation, when tacking a problem, of interacting with the HW directly, does not stand in modern environments.
Even better, but unrelated question... how the heck does a x86 OS request data from the HDD?
You'd presumably have some "block device" abstraction between your filesystem and your device driver. Don't want to re-implement a FS for each type of hardware. On a Linux system, you can read, eg, /dev/sda1 from userspace, which is what it looks like this filesystem probably does.
As for how you actually request data from the hard drive:
There's older ATA interfaces, and BIOS routines from them, which I suspect is what most hobbyist OSes would use.
A more modern interface is AHCI. The OSDev wiki has an overview, where you can see how the registers work:
https://wiki.osdev.org/AHCI
as an aside, for our embedded system we use https://github.com/ARMmbed/littlefs for our flash file system, it has a bit of a description on its design and its copy on write system so that it can handle random power loss. Be nice to see some of these kinds of libraries done in Nim or Rust.
> how the heck does a x86 OS request data from the HDD?
Entirely too short summary: Use PCI to discover the various devices attached to the CPU. One or more of them are AHCI or NVMe devices. The AHCI and NVMe standards each describe sets of memory-mapped configuration registers and DMA engines. Eventually, you get to a point where you can describe linked lists of transactions to be executed that are semantically similar to preadv, pwritev, and so on.
I've never worked at a FAANG, too.
But main reason I want to work at one is that ability to do project at a scale which is not possible anywhere. Few project are no use for small companies/startups.
For example: Optimizing compile time (no need to invest for extra 1 minute speed up), working on high quality labelled data (i came from ML background, this is not possible in most of startups), analytics on data (questionable ethically), working on Ad platforms, working on large scale system.
In the last, Imagine, even making simple changes have bigger impact on real world.
It might be possible that these people have tried on more platforms (Face recognition APIs) but only reported those where they got good accuracy in terms of defeating system.
I personally would like to see tests done on facebook by uploading these images and checking if it can recognize it.
This is tested on existing models/Face Recognition API which means locked pre-trained models. So, They might have learned way to add pixels such that model outputs very different embedding. This is know issue in deep learning [0][1][2].
I believe, Model trained on cloaked images would defeat its purpose and make this technique useless.
[0] Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai. "One pixel attack for fooling deep neural networks." IEEE Transactions on Evolutionary Computation 23.5 (2019): 828-841.
[1] Guo, Chuan, et al. "Countering adversarial images using input transformations." arXiv preprint arXiv:1711.00117 (2017).
[2] Liu, Yanpei, et al. "Delving into transferable adversarial examples and black-box attacks." arXiv preprint arXiv:1611.02770 (2016).
But the model will eventually be updated to detect and process the new cloaking images. So, to stay ahead, you decide to create a model that automatically generates different cloaking images, and... The whole system is now just a GAN : https://en.wikipedia.org/wiki/Generative_adversarial_network
I think there's a (hopefully strongly privacy preserving) combinatorial explosion here though. If current models can be trained to accurately-enough recognise me with, say, 100 training images - this tool might produce unique enough perturbations to require 100 images for each of the possible perturbations, potentially requiring you to train your new model using tens of thousands or millions of cloaked versions of the 100 images for each of the targets in your training set.
(If I were these researchers I'd totally be reaching out to AWS/Azure/GCE for additional research funding... <smirk>)
Not necessarily, because the changes are destructive. They can't restore what was there before, and they can't necessarily infer which image was cloaked and which was not.
The FAQ there addresses that, suggesting you can "dilute down" the ratio of normal-to-cloaked images in the public data sets the model creators train on, and hence reduce their future accuracy.
(So now you just need to somehow get as many cloaked photos of yourself uploaded and tagged to FB as they've collected in the last decade or so...)
If you use a new cloaking image for each picture you upload to social then they will all be embedded in a different location for a given feature extractor and an adversary wouldn’t be able to reverse search for linked pictures—that’s at least my understanding of how the method would need to be used. But if you keep using the same cloaking image, your adversary could definitely learn that process and effectively undo it.
Technologies/Skills: Machine Learning, Deep Learning, Computer Vision, Python, Pytorch, Flask
Résumé/CV and Email : available on Personal website
Personal Website:
pi pi pi pi pi pi pi pi pi pi pika pipi pi pipi pi pi pi pipi pi pi pi pi pi pi pi pipi pi pi pi pi pi pi pi pi pi pi pichu pichu pichu pichu ka chu pipi pipi pipi pipi pi pi pi pi pikachu pi pi pi pi pi pi pi pi pi pi pi pi pikachu pikachu ka ka ka ka pikachu pichu ka ka ka ka ka ka ka ka ka ka ka ka pikachu ka ka ka ka ka ka ka ka ka ka ka pikachu pikachu pipi ka ka ka ka ka ka ka pikachu pi pi pi pi pikachu pikachu pi pi pikachu pi pi pi pikachu pi pi pikachu ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka pikachu pi pi pi pi pi pi pi pi pi pi pi pikachu pichu pi pi pi pi pikachu ka ka ka ka ka pikachu pipi ka ka ka ka ka pikachu pi pi pikachu pi pi pi pi pi pi pi pi pi pi pi pikachu ka ka ka ka ka ka ka ka ka ka ka ka pikachu pi pi pi pi pi pi pi pi pi pi pi pi pi pikachu ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka ka pikachu pichu pikachu pipi pi pi pi pi pi pi pi pikachu pi pi pi pi pi pi pikachu pichu pi pikachu
After, reading your comment. I cross checked result and found out that 3 is most optimal size. I even ran fib(40) on it. with size 2, There are many misses but after 3 and on wards misses are constant(if fib(40), then misses are only 40 which emulates DP approach of O(N)).
Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.
It makes sense. f(n) depends on f(n-1) and f(n-2). So if the cache is able to produce these 2 values, you basically get the linear algorithm from the article. I assume the running f(n) also takes up a cache slot, hence 3 instead of 2.
If this theory is correct, every recursive function f(n) requiring access to f(n-x) should have x+1 as maximum usefull cache size.
Remote: Yes
Willing to relocate: Probably yes
Technologies: Python, Flask, PyTorch, Spacy, C
Areas: NLP, CV, Optimization
Résumé/CV: https://read.cv/dipkumar/
Email: dip.patel.ict@gmail.com
Why hire Dip: - Worked as machine learning engineer + research engineer + backend engineer.
- single handedly deployed multiple ML system in production
- I believe creating baseline first and improving from it instead of going with biggest weapon.
- fast learner (worked on various project ranging from stitching photos to speech intent detection to solving NP-hard problems)
Why not to hire Dip:
- Need Research Scientist instead of MLE or Research Engineer
- Need senior (experienced) backend engineer