A lot of the hype surrounding Artificial Intelligence, specifically "generative AI", in the public eye is more about image manipulation, a little bit like bringing professional visual effects at the reach of anybody's key stroke. While this was initially the most advertised use of AI, this was far from grasping the depth of the technological shift. So maybe parts of AI are a little bit overrated, especially when it comes to qualifying statistical likeliness as reasoning, but the shift is definitively not just about the user facing toy: with AI bringing many technology breakthroughs, of course related to visual and audio recognition, but also to browsing and analyzing large scale data, managing massive synchronization bandwidth, image, audio, voice processing... This is also about using optimized mathematical computations in software developments thanks to a legitimized GPU usage.
From a software development stand point, AI already has a significant value, and it may not be what first comes to mind: bring back "research" based, algorithmic centric, truly value added coding. Because of the high demand, development became a commodity, with more and more redundancy, search engine based brainless copy/paste. The industry may have forgotten for a while that code was itself all about stripping un-necessary redundancy in day to day tasks, "automation for the brain". But code became heavily redundant globally, with too many developments being just a small variation of another. In part because doing what everybody else is already doing can be re-assuring. The current work on AI requires to get back to reading literature on a matter, taking pencils to draft something which has not been existing yet.
Still at this point in time, the biggest flaw of AI is certainly the too early success of AI itself...
AI feeds itself from "abstract" data. "abstract" in the sense that data in the digitized world are just numbers, numbers which only gain meaning when linked to their effects in the real world. Statistical distribution of those numbers defines models and their evolutions. So as long as real life, human generated data keeps representing the major share of samples, models will keep fine tuning themselves and "learning". However when AI generated content gets into circulation, a loopback is started, and, in the worst case, this can lead to a learning deadlock. How to qualify as a source what is already the result of a previously washed out extraction.
Let's take as an example context based human language translations. Unless we can consider (inappropriately though) that we already have an absolute data set in the digitized world for all contexts leading to the most appropriate translation of a sentence, the gradual circulation of AI based translations, which will inevitably take over due to its cost saving benefits, will necessarily limit the number of variations of results, especially in the digital realm. AI based translations will stop improving. The optimistic view though is that once AI companies will have scavenged all existing data materials and realize that models stop improving, they will certainly become the first to hire and re-develop human based jobs... such as live translators, to keep going with the earlier example.
Back to the basics: Benefits of Statistical Analysis and Dynamic Models
As early as 2012, while facing the challenge of bringing a relatively high security to devices which did not support or allowed any kind of hardware based security, our team worked on a "dynamic security" feature trying to overcome the challenge that any static software decision can be hacked should we allow a long enough time period. At that time, security on mobile was a big big, big, challenge: no separation between program and data spaces, limited notions of privileged execution, etc.
The inhouse development had the internal code name "El Greco" referring to the advanced computer in the movie Ocean's 13. It consisted in analyzing large data sets and build pseudo-"dynamic" security models around them, with a client and a server side component.
While everything was purely anonymous and already addressing privacy policies which would only come to light few years later, the analysis of these enormous data sets from millions of users built the first behavioral based security models of our systems. Every month or so, we were meeting with a strategic customer, a leading telecom operator, to review the findings and enable the updated models in their production environment. This was very comparable to LLMs.
So statistical analysis and dynamic models, in the "AI" trend, can be definitively beneficial for security. At least it was very successful for us... with a "Days without incidents: 0" still on the wall. The models enabled "proactive" security which was and may still be invaluable. But let's also take a look at the down side for a fuller picture.
AI: the Next Great Tool for Hackers? Maybe... Sadly... :(
AI a new "tool" but that still needs a lot of human intelligence to be really effective.
That was in the subway, and this was an ad from a major toy brand: with the picture of what looks like a nice and friendly family, at first. Still, watching the ad came with an inexplicable feeling of weirdness. Looking closer: the ratio of faces does not look right, compared to the body and perspective, one of the individuals has 6 fingers, directions of legs are just not possible, etc. Of course AI can help combining quickly an improbable situation inside the same image, but before doing a massive campaign with pictures on billboards, we would still expect a review from a professional photographer. It would certainly be just a fraction of the overall cost and would massively help the efficiency of the campaign by not giving this uncomfortable feeling when looking at the images.
So we're already here: at the moment of overconfidence and taking shortcuts, relying exclusively on this still unproven new tool. And many companies have already taken major steps.
The argument is absolutely not about looking backward. The future is forward. And the future has those new developments, developments that are currently categorized under this "AI" frenzy. As initially stated, "AI" includes a lot of technological values. But different ways could get us to this future. And shortcuts could be damaging.
So let's think for a second of a few drawbacks in a blind shift to AI with regards to security of software based systems.
At this point AI code generation or correction has the internet code collection as source, and a limited depth in learning iterations. Two major drawbacks: #1 shuffling legacy with all its flaws and #2 predictability.
About #1, it is well accounted for today that open source, or any code widely available, may equally be helpful or harmful. Helpful as a way to get a feature going quickly, harmful as a trojan horse in the final products. Relying on AI to inject references to non vetted code without careful review is one step further away from security, which was already compromised by open libraries and frameworks too open to uncontrolled changes.
About #2, the challenge can be about the lack of randomness and diversity when AI automatic code is introduced in a product. Should we get an insight into the typical requests to an AI generation tool, insight which exists somewhere already, we can end up collecting common recipes of implementations. While hackers were able to find loopholes in human code, random human errors, one will be able to assume protocol and higher level mechanisms for a certain problem.
So we get back here to our subway ads: AI can be an amazing tool to step up the game, but only when used without shortcuts.
How do we use AI at idvu.co?
For our research team, AI is a great tool: it can speed up understanding of what is already known... and therefore not worth pursuing... If AI had an elaborated and valid answer, then the subject would have already been covered, one way or another. However if AI did not have a meaningful answer, or had an invalid answer because it just shuffled what was a first level of research on a subject, then there has been something worth looking into. This would mean that there was no easy answer to the problem, and only a careful and in depth true engineering work would be able to change that. We used AI many times to verify that suggested solutions, while sometimes technically correct from a programming stand point, do not effectively work.
For our development teams, AI use, for assisted code generation, is very limited, for several reasons. First, development teams take over the research work, while the latter was narrowed down for its uniqueness, and for which generic solutions can not apply. Then we need an exhaustive "bill of materials", with detailed understanding of all components involved, especially from a security perspective. Third, using AI means that everything we do is sent to servers somewhere and used for training of models, with certainly humans reviewing the outcome.
Last, as it was first mentioned: the "AI" scope involves many advanced aspects for processing images, sounds, carrying and manipulating data. All these aspects intersect and benefit to our domain, and have become strong area of interests for us with some active developments.
While "post quantum" is a pretty fancy term that sounds like it comes from a dystopian apocalyptic novel, it has become a paradigm of increasing weight for the largest business interests. So the question is really about exploring solutions for a prequantum reality.
It's always the same business case for securing movies, so let's go with it one more time: productions cost millions, if not hundreds of millions of dollars. Enabling the longest lifetime for a production can help secure the initial investment in the first place. While the use of quantum computing for everyday hacking is still a long shot, progress is happening at an exponential pace, and government-like scale means have found their way against major business interests.
DRM, or Digital Rights Management, is a system enforcing business rules when playing audio or video media. Specifically, the stack attempts to restrict and block digital copy. Business rules can be about limiting access in time, controlling the number of devices allowed to play the video, etc. For that purpose, the video itself is encrypted. The encryption key is shared between a license server and the DRM stack on the device using a proprietary and secure protocol.
Typical deployments still rely exclusively on standard DRMs to digitally secure the assets.
A significant strength of existing DRM implementations is that only the pure audio or video portion of the file is encrypted, not the whole file with its headers and whatnot. Because a challenge for decrypting is first knowing when you get the right key. Decryption always works basically; any key will convert a binary string of characters to another binary string; it is just whether the decrypted message makes sense or not. If the content was a text message, successful decryption would be when the message could be read and would have its original meaning. That is the same for video: the key is found when the video can be played. That seems obvious, but it can take some work to implement a success criteria. And because video decoders are designed for resilience with corrupted messages, they will themselves try to keep playing the video as much as possible. A video expert though would still be able to define the likely criteria... because the header is not encrypted and that can give some clues...
The key exchange mechanism between the license server and the client is unfortunately not "post quantum" ready. And it seems challenging for existing, well-known DRM systems to evolve quickly and address the latest understanding of future quantum attacks, especially because those DRM systems often have a hardware implementation. While the "hardware DRM" is considered the safest using today's criteria, it is also "frozen" in time, constrained by the pace of low-level updates at the heart of devices.
It's not sure though the security of the key exchange is of meaningful importance for quantum mechanics anyway.
Also, standard DRMs still use relatively short keys compared to the quantum challenge or even current computational capabilities. This is a significant weakness.
"Harvest and decrypt" is one of the challenges of today's security: while we still operate with decades-old technologies, content could be archived and later deciphered using quantum techniques.
This may be an outdated perception, but "streaming" has often been considered more secure than "downloading." In the context of "harvest and decrypt," "streaming" using typical protocols is not that helpful, unfortunately. In both cases, content can still be easily retrieved from the source.
One can think of the following elements to be considered on the journey for post quantum security:
- minimize "harvest" risks by controlling access to content, whether it's for "streaming" or "download" use.
- make it harder to implement a "success criteria" for decryption, noting such a solution must apply to all kinds of encryption-related attacks.
- significantly increase "full" decryption complexity
- leveraging and working around device capabilities
- existing Studio-approved DRMs are good but certainly not enough.
- increase key exchange complexity using a postquantum-ready implementation
- increase recipe diversity
- keep up with NIST recommendations
Bottom line, getting ready for the so-called "post quantum" security first requires expert knowledge on all aspects of the issue: on both the server and client sides, related to storage, transmission protocols, video encoding and decoding, encryption/decryption, key(s) exchanges and storage, and device capabilities. Absolute security does not exist, and just converging towards it can be a very long process and/or an expensive one. The first advice would therefore be about an appropriate assessment of the right level of security for a certain business need.
Of course, we'd like to think that we, at idvu.co, have a significant expertise, advanced solutions and a compelling offering for that challenge.
Nearly per the definition of such a page, this is an informal, conversation style page with regular updates from our team members about subjects addressed with our customers, or researched by our development teams for our products and solutions.
Posts are not supposed to be doctorate thesis on a subject. So they are not supposed to have a comprehensive coverage of all aspects related to the topic. However, we hope they can help somehow and contribute to the discussion.
And because we're not very fond of spamming anybody, we like to share those posts here, with an old school page for you to browse.
By idvu.co