Navigating the use of LLMs for Code Generation

Introduction

Data Intuitive is committed to streamlining bioinformatics workflow development processes. Our core tool, Viash ¹, was developed to simplify and standardize computational workflow management by generating essential boilerplate code automatically - a cornerstone feature that relies on a robust, deterministic, rule-based system built on our extensive experience in the field.

Inspired by recent advances in the field of artificial intelligence (AI) and large language models (LLMs), Data Intuitive set out to explore how LLMs can further elevate Viash and improve the workflow development experience with even greater automation of code.

The potential of LLMs for code generation

In our exploration of applying LLMs for code generation in bioinformatics workflow development, we have witnessed several key aspects of how LLMs are transforming the development landscape.

One notable advantage is the ability to generate context-specific boilerplate code, which can streamline the initial setup of projects and frees up developer time for more complex tasks. Techniques such as retrieval-augmented-generation (RAG)², fine-tuning of LLMs ³ or few-shot prompting ⁴ empower LLMs to produce Viash-specific code and tailored solutions for the tasks at hand.

Additionally, LLMs have demonstrated strong syntax proficiency, enabling it to generate correct code snippets across various programming languages, which can help maintain code quality and consistency.

These capabilities not only have the advantage of speeding up the coding process, but also of lowering the barrier of entry to workflow development, allowing users with different levels of experience to engage more readily in programming projects.

Limitations of LLM-based code generation

Despite the potential of LLMs in automating code generation, we quickly encountered several limitations.

One significant challenge is their lack of logical reasoning; bioinformatics workflows can be highly complex, and LLMs do not inherently possess a structured reasoning process to navigate this intricacy. For example, when developing code to run specific steps or aspects of a workflow, an LLM may correctly identify variables to be parameterized, but fail to understand the dependencies required to run a specific tool effectively. Similarly, when tasked with combining multiple workflow steps into a coherent sequence, the LLM struggles to grasp how to integrate them logically.

Additionally, the temperature setting of the LLM plays a critical role: a high temperature may generate code with insufficient adherence to syntax and precision, whereas a low temperature—while necessary for enforcing strict syntax rules—can limit the model’s ability to adapt its output to nuanced context requirements.

Even in simpler use cases where the overall code structure appears correct, LLMs can introduce (subtle) bugs that render the generated code unusable, resulting in significant time lost debugging and troubleshooting.

Considerations beyond the technology

Beyond inherent limitations of the technology behind LLMs, there are several other risks related to the rapidly evolving landscape of LLMs.

Commercially available LLM models are often deprecated quickly, necessitating regular retraining to maintain their relevance. Moreover, when working with intellectual property-protected data—which is often the case with bioinformatics workflows—commercial LLMs pose significant security risks because of context leakage. While the use of open-source LLMs could overcome those concerns, the current global shortage of GPUs due to a surge in demand and a lag in supply has made it difficult to work with those models.

The environmental impact of training and deploying LLMs is also a growing concern, as these models require massive computational resources both during training and when used (at scale), leading to high energy consumption and carbon emissions.

These challenges are compounded by the risk of fluctuations in costs for GPUs and commercially available LLMs, which can impact long-term project sustainability.

Balancing benefits with limitations

The potential for AI-generated code is clear—it can accelerate development, reduce repetitive tasks, and help bridge gaps for new programmers. Specific LLM-based tools enhance the development process in various ways, including interpreting code (e.g., chatbots that explain) or suggestion-based generation of smaller code snippets (e.g., coding assistants such as GitHub co-pilot). However, these benefits must be weighed against the risks of over-reliance on AI, particularly for complex or mission-critical tasks.

While we continue to monitor the advancements in AI and LLMs with interest, we believe that the current technology and existing services are not mature enough to deliver substantial improvements in our automated processes. We are committed to an approach grounded in high ethical and quality standards, ensuring the integrity of our products.

Footnotes

https://www.data-intuitive.com/products/viash.html↩︎
https://python.langchain.com/docs/concepts/rag/↩︎
https://huggingface.co/docs/transformers/en/training↩︎
https://www.datacamp.com/tutorial/few-shot-prompting↩︎