我正在创建一个 Chrome 扩展,它将长度为nlen的一串核苷酸转换为相应的氨基酸。
我以前在 Pytn 中做过类似的事情,但由于我对 JavaScript 仍然很陌生,我正在努力将同样的逻辑从 Pytn 转换为 JavaScript。
function translateInput(n_seq) {
// code to translate goes here
// length of input nucleotide sequence
var nlen = n_seq.length
// declare initially empty amino acids string
var aa_seq = ""
// iterate over each chunk of three characters/nucleotides
// to match it with the correct codon
for (var i = 0; i < nlen; i++) {
aa_seq.concat(codon)
}
// return final string of amino acids
return aa_seq
}
我知道我想一次迭代三个字符,将它们与正确的氨基酸匹配,然后将该氨基酸连续连接到氨基酸的输出字符串(aa_seq),一旦循环完成,返回该字符串。
我还尝试创建a dictionary的密码子与氨基酸的关系,并想知道是否有一种方法可以使用类似的工具将三个字符密码子与其各自的氨基酸相匹配:
codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
编辑:核苷酸的输入字符串的一个例子是“AAGCATAGAAATCGAGGG”,与相应的输出字符串“KHRNRG”希望这有助于!
我个人建议的第一件事是建立一个从 3-char 密码子到氨基的字典。这将允许您的程序获取几个密码子字符串链并将其转换为氨基字符串,而无需每次都进行昂贵的深度查找。字典将像这样工作
codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc
从那里,我实现了两个实用程序函数:slide
和slideStr
。这些并不是特别重要,所以我将用几个输入和输出示例来介绍它们。
slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]
slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]
slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']
slideStr (2,2) ('abcd')
// ['ab', 'cd']
使用反向字典和通用效用函数,编写codon2amino
轻而易举
// codon2amino :: String -> String
const codon2amino = str =>
slideStr(3,3)(str)
.map(c => codonDict[c])
.join('')
可运行演示
为了澄清,我们基于aminoDict
一次构建codonDict
,并将其重新用于每个密码子到氨基的计算。
// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
// codon dictionary derived from aminoDict
const codonDict =
Object.keys(aminoDict).reduce((dict, a) =>
Object.ign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})
// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
if (n > xs.length)
return []
else
return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}
// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
slide(n,m) (Array.from(str)) .map(s => s.join(''))
// codon2amino :: String -> String
const codon2amino = str =>
slideStr(3,3)(str)
.map(c => codonDict[c])
.join('')
console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG
进一步解释
你能澄清这些变量应该代表什么吗?(n,m,xs,c 等)
我们的slide
函数为我们提供了一个数组上的滑动窗口。它需要两个窗口参数-n
窗口大小和m
步长-以及一个参数,即迭代通过的项目数组-xs
,可以读取为x
的集合,或复数44
slide
是有目的的泛型,因为它可以在任何iterablexs
上工作。这意味着它可以与 Array,String 或任何其他实现Symbol.iterator
的东西一起工作。这也是为什么我们使用像xs
这样的通用名称的原因,因为将其命名为特定的 pigeon
其他像.map(c => codonDict[c])
中的变量c
并不是特别重要-我将其命名为coden的c
,但我们可以将其命名为x
或foo
,没关系。
[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]
所以实际上我们在这里做的是取一个数组([1 2 3 4 5]
)并创建一个新数组,我们为原始数组中的每个元素调用f
现在,当我们查看.map(c => codonDict[c])
时,我们了解到我们所做的只是在codonDict
中查找每个元素的c
const codon2amino = str =>
slideStr(3,3)(str) // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
.map(c => codonDict[c]) // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
.join('') // 'KHRN...'
此外,这些“const”项目能够基本上取代我原来的translateInput()
功能?
如果你不熟悉 ES6(ES2015),上面使用的一些语法可能看起来很陌生。
// foo using traditional function syntax
function foo (x) { return x + 1 }
// foo as an arrow function
const foo = x => x + 1
简而言之,是的,codon2amino
是您的translateInput
的确切替代品,只是使用const
绑定和箭头函数定义。我选择codon2amino
作为名称,因为它更好地描述了函数的操作-translateInput
并没有说它采用哪种方式进行翻译(A 到 B,或 B 到 A>?)
您看到其他const
声明的原因是因为我们将函数的工作拆分为多个函数。这样做的原因大多超出了这个答案的范围,但简短的解释是,一个承担多个任务责任的专用函数对我们来说不如多个通用函数有用,这些函数可以以合理的方式组合 / 重用。
当然,codon2amino
需要查看输入字符串中的每个 3 字母序列,但这并不意味着我们必须在0
所有这些都说了。
有什么办法,我可以做到这一点,同时保持我原来的循环结构?
我真的认为你应该花一些时间来了解上面的代码,看看它是如何工作的。
当然,这不是解决问题的唯一方法。我们可以使用原始的for
循环。对我来说,考虑创建迭代器i
并手动递增i++
或i += 3
,确保快速检查i < str.length
,重新分配返回值等
function makeCodonDict (aminoDict) {
let result = {}
for (let k of Object.keys(aminoDict))
for (let a of aminoDict[k])
result[a] = k
return result
}
function translateInput (dict, str) {
let result = ''
for (let i = 0; i < str.length; i += 3)
result += dict[str.substr(i,3)]
return result
}
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)
const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG
此外,你可以写上面的答案(@ guest271314)在一个紧凑的形式:
var res = ''
str.match(/.{1,3}/g).forEach(s => {
var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
res += key != undefined ? key : ''
})
你可以在下面看到完整的答案。
const codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
const str = "AAGCATAGAAATCGAGGG";
let res = "";
// just rewrite the above code into the srt answer
str.match(/.{1,3}/g).forEach(s => {
var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
res += key != undefined ? key : ''
})
console.log(res);
嗯,我建议首先改变你的字典的形状-这种方式不是很有用,所以让我们这样做:
const dict = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
}
const codons = Object.keys(dict).reduce((a, b) => {dict[b].forEach(v => a[v] = b); return a}, {})
//In practice, you will get:
const codons = { GCA: 'A',
GCC: 'A',
GCG: 'A',
GCT: 'A',
TGC: 'C',
TGT: 'C',
GAC: 'D',
GAT: 'D',
GAA: 'E',
GAG: 'E',
TTC: 'F',
TTT: 'F',
GGA: 'G',
GGC: 'G',
GGG: 'G',
GGT: 'G',
CAC: 'H',
CAT: 'H',
ATA: 'I',
ATC: 'I',
ATT: 'I',
AAA: 'K',
AAG: 'K',
CTA: 'L',
CTC: 'L',
CTG: 'L',
CTT: 'L',
TTA: 'L',
TTG: 'L',
ATG: 'M',
AAC: 'N',
AAT: 'N',
CCA: 'P',
CCC: 'P',
CCG: 'P',
CCT: 'P',
CAA: 'Q',
CAG: 'Q',
AGA: 'R',
AGG: 'R',
CGA: 'R',
CGC: 'R',
CGG: 'R',
CGT: 'R',
AGC: 'S',
AGT: 'S',
TCA: 'S',
TCC: 'S',
TCG: 'S',
TCT: 'S',
ACA: 'T',
ACC: 'T',
ACG: 'T',
ACT: 'T',
GTA: 'V',
GTC: 'V',
GTG: 'V',
GTT: 'V',
TGG: 'W',
TAC: 'Y',
TAT: 'Y' }
//Now we are reasoning!
//From here on, it is pretty straightforward:
const rnaPr = s => s.match(/.{3}/g).map(fragment => codons[fragment]).join("")
您可以使用for
循环,String.prototype.slice()
从字符串for..of
循环,Object.entries()
开始迭代字符串三个字符,以迭代codon_dictionary
对象的属性和值,Array.prototype.includes()
将输入字符串的当前三个字符部分与设置为codon_dictionary
属性的数组匹配。
const codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
const [entries, n] = [Object.entries(codon_dictionary), 3];
let [str, res] = ["AAGCATAGAAATCGAGGG", ""];
for (let i = 0; i + n <= str.length; i += n)
for (const [key, prop, curr = str.slice(i, i + n)] of entries)
if (prop.includes(curr)) {res += key; break;};
console.log(res);
本站系公益性非盈利分享网址,本文来自用户投稿,不代表边看边学立场,如若转载,请注明出处
评论列表(22条)