Pemula Exercism • python

RNA Transcription

Ringkasan Tantangan

Given a DNA strand, return its RNA complement.

Introduction

You work for a bioengineering company that specializes in developing therapeutic solutions.

Your team has just been given a new project to develop a targeted therapy for a rare type of cancer.

It's all very complicated, but the basic idea is that sometimes people's bodies produce too much of a given protein.
That can cause all sorts of havoc.

But if you can create a very specific molecule (called a micro-RNA), it can prevent the protein from being produced.

This technique is called [RNA Interference][rnai].

[rnai]: https://admin.acceleratingscience.com/ask-a-scientist/what-is-rnai/

Instructions

Your task is to determine the RNA complement of a given DNA sequence.

Both DNA and RNA strands are a sequence of nucleotides.

The four nucleotides found in DNA are adenine (A), cytosine (C), guanine (G), and thymine (T).

The four nucleotides found in RNA are adenine (A), cytosine (C), guanine (G), and uracil (U).

Given a DNA strand, its transcribed RNA strand is formed by replacing each nucleotide with its complement:

G -> C
C -> G
T -> A
A -> U

If you want to look at how the inputs and outputs are structured, take a look at the examples in the test suite.

Dig Deeper

translate maketrans

`translate()` with `maketrans()`

LOOKUP = str.maketrans('GCTA', 'CGAU')


def to_rna(dna_strand):
    return dna_strand.translate(LOOKUP)

This approach starts by defining a dictionary (also called a translation table in this context) by calling the maketrans() method.

Python doesn’t enforce having real constant values, but the LOOKUP translation table is defined with all uppercase letters, which is the naming convention for a Python constant. It indicates that the value is not intended to be changed.

The translation table that is created uses the Unicode code points (sometimes called the ordinal values) for each letter in the two strings. As Unicode was designed to be backwards compatible with ASCII and because the exercise uses Latin letters, the code points in the translation table can be interpreted as ASCII. However, the functions can deal with any Unicode character. You can learn more by reading about [strings and their representation in the Exercism Python syllabus][concept-string].

The Unicode value for “G” in the first string is the key for the Unicode value of “C” in the second string, and so on.

In the to_rna() function, the translate() method is called on the input, and is passed the translation table. The output of translate() is a string where all of the input DNA characters have been replaced by their RNA complement in the translation table.

dictionary join

dictionary look-up with `join`

LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}


def to_rna(dna_strand):
    return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)

This approach starts by defining a dictionary to map the DNA values to RNA values.

Python doesn’t enforce having real constant values, but the LOOKUP dictionary is defined with all uppercase letters, which is the naming convention for a Python constant. It indicates that the value is not intended to be changed.

In the to_rna() function, the join() method is called on an empty string, and is passed the list created from a generator expression.

The generator expression iterates each character in the input, looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list.

The join() method collects the RNA characters back into a string. Since an empty string is the separator for the join(), there are no spaces between the RNA characters in the string.

A generator expression is similar to a list comprehension, but instead of creating a list, it returns a generator, and iterating that generator yields the elements on the fly.

A variant that uses a list comprehension is almost identical, but note the additional square brackets inside the join():

LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}

def to_rna(dna_strand):
    return ''.join([LOOKUP[nucleotide] for nucleotide in dna_strand])

For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases. You can read more about when to choose generators over list comprehensions to dig deeper into the topic.

As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.

Source: Exercism python/rna-transcription

💡 Lihat Solusi

def to_rna(dna_strand):
    mapping = {"G": "C", "C": "G", "T": "A", "A": "U"}
    return "".join(mapping[c] for c in dna_strand)