A survey of the generative AI landscape

An analysis of the capabilities and limitations of AI tools and an exploration of its productive use.

Dec 31, 2022

Recently, generative artificial intelligence has made groundbreaking advancements. It can now generate high-quality images, graphics, music and even 3D models. Any desired text, from tweets to poems to entire essays, can also be generated from a few words of input. AI development progresses so quickly that any list of AI tools becomes outdated before it can be finished. With a proliferation of tools, all with shiny overselling demo-videos, and often similar looking websites, determining the capacities as well as the limitations of a given tool can be challenging. Further difficulties arise in conceptualizing where it is susceptible to errors, validating the correctness of its output, ascertaining its usefulness, and discovering how it can be deployed productively.

This essay endeavors to organise the capabilities of AI tools into a spectrum and elucidate certain general conceptual restrictions to gain a better understanding of them. Along the way, it will of course introduce you to some useful tools. (complete list).

It is key to understand that AI tools are probabilistic, which makes it difficult to evaluate its capabilities. Was the perfectly generated essay just luck? Or does the tool make so many errors, that all the time saved writing has to be invested again in verifying the output? This is sadly mostly knowable by trial and error, and therefore impressive-looking demo videos must be treated with a grain of salt. It’s also often unclear if the capability shown in the video generalizes to your use case. Even if it does, limitations, like a maximum input length, can handicap its usefulness.

This is compounded by the fact that AI tools are relatively brittle, and errors can also be caused by the user's input. If you transcribe a podcast with an AI tools, and it produces an unconvincing error-ridden text, is it because the tool sucks or because the audio quality was bad?; was the speaker speaking too fast or was his/her accent too thick? Or maybe you just used the wrong parameters, and different settings would have produced a satisfactory result. Since testing these tools often requires considerable effort, understanding conceptual limitations, helps to get a better picture of possible use cases.

It is again central to remember that AI tools are probabilistic and are therefore error-prone, and their outputs cannot always be relied on. This doesn't matter when it is used as an inspirational tool, where one doesn’t have a clear idea of the desired result, but makes these tools unfit for unsupervised processes. In general, generative AI tools are best understood as aiding humans in their tasks, but that varies depending on where error-correction is most important.

AI tools can be broadly categorized into text, graphic and sound. Text of course includes AI tools that can generate texts of all forms, tweets, poems, essays, as well as narrowly specialized ones (e.g. patent applications). AI generated texts initially look convincing, but most will seem to be rambling and to avoid making a specific point. Furthermore, many generated texts will contain factual errors. Fully automated, unedited, writing will therefore harm the reputation of its users (the grading of student essays, where somewhat coherent texts usually get a passing grade, will have to evolve as well). Still, these tools can be used to overcome writing block or to get a first draft that is then edited for clarity and intent. Other tools specifically aim to be writing assistants, that autocomplete, rephrase or change the style of your text.

Interestingly, the capabilities of text-generating AI tools also extend to all tasks that can be reduced to text: translations, writing code, SQL queries, shell commands, regular expressions and Excel formulas. Most of these specialized tools are based on OpenAI’s proprietary GPT-3 model. OpenAI’s release of the much more advanced and currently free to use ChatGPT, which encompasses all of the above capabilities into a single tool, will provide an interesting challenge, as it remains to be seen, if the specialized companies can provide sufficient utility to warrant separate subscriptions.

Graphics include image recognition and image generation and all its derivatives like logos, drawings, and fonts, etc. as well as 3d-modells and video editors.

Image generation tools work best, if they don't have to fulfill one's specifications a 100%. For game design, illustrating articles and idea generation, where one has only a vague idea of the desired outcome, they work pretty well out of the box. Completing an Upwork design task that has concrete specifications usually requires correcting some details manually.

Interesting derivatives that make use of AI’s image recognition capabilities include, CountThings, which can count how many nails/logs/Covid vials are in a photo or Tailorbird, which claims to generate floor plans and measurements from photos with an accuracy of 98%. (Tailorbird’s website is very vague, but conceptually it seems plausible)

Lastly, sound includes text to speech, transcriptions, but also generating voices, dubbing videos, and generating music.

One key potential left out of this analysis is generative search, which I will explore in depth in an upcoming essay.

Click to access the complete list of over 135 tools

Future Potentialis

Discussion about this post