Aals019392.1
Basic Information
- Insect
- Agonopterix alstromeriana
- Gene Symbol
- rx1
- Assembly
- GCA_963924505.1
- Location
- OZ004684.1:25889-55302[+]
Transcription Factor Domain
- TF Family
- Homeobox
- Domain
- Homeobox
- PFAM
- PF00046
- TF Group
- Helix-turn-helix
- Description
- This entry represents the homeodomain (HD), a protein domain of approximately 60 residues that usually binds DNA. It is encoded by the homeobox sequence [7, 6, 8], which was first identified in a number of Drosophila homeotic and segmentation proteins, but is now known to be well-conserved in many other animals, including vertebrates [1, 2], as well as plants [4], fungi [5] and some species of lower eukaryotes. Many members of this group are transcriptional regulators, some of which operate differential genetic programs along the anterior-posterior axis of animal bodies [3]. This domain folds into a globular structure with three α-helices connected by two short loops that harbour a hydrophobic core. The second and third form a helix-turn-helix (HTH) motif, which make intimate contacts with the DNA: while the first helix of this motif helps to stabilise the structure, the second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. One particularity of the HTH motif in some of these proteins arises from the stereo-chemical requirement for glycine in the turn which is needed to avoid steric interference of the β-carbon with the main chain: for cro and repressor proteins the glycine appears to be mandatory, while for many of the homeotic and other DNA-binding proteins the requirement is relaxed.
- Hmmscan Out
-
# of c-Evalue i-Evalue score bias hmm coord from hmm coord to ali coord from ali coord to env coord from env coord to acc 1 22 5.4e-16 1.2e-13 51.1 0.8 1 46 103 148 103 148 0.98 2 22 6.9e-05 0.015 15.6 0.0 21 45 147 171 147 172 0.95 3 22 1.7e-10 3.8e-08 33.5 0.2 11 45 171 205 169 206 0.96 4 22 6.1e-11 1.3e-08 35.0 0.2 11 46 205 240 203 240 0.97 5 22 1.4e-11 3.1e-09 37.0 0.2 9 46 255 292 250 292 0.95 6 22 1.4e-11 3.1e-09 37.0 0.2 9 46 307 344 302 344 0.95 7 22 1.4e-11 3.1e-09 37.0 0.2 9 46 359 396 354 396 0.95 8 22 4.1e-11 9e-09 35.5 0.3 9 45 411 447 406 448 0.95 9 22 1.7e-10 3.7e-08 33.5 0.2 11 45 447 481 445 482 0.96 10 22 1.2e-10 2.6e-08 34.0 0.2 11 46 481 516 479 516 0.96 11 22 6.9e-05 0.015 15.6 0.0 21 45 515 539 515 540 0.95 12 22 1.6e-10 3.4e-08 33.6 0.3 11 45 539 573 537 575 0.96 13 22 1.5e-10 3.3e-08 33.7 0.2 11 45 573 607 571 609 0.96 14 22 1.7e-10 3.7e-08 33.5 0.2 11 45 607 641 605 642 0.96 15 22 1.2e-10 2.7e-08 34.0 0.2 10 45 640 675 638 677 0.95 16 22 1.7e-10 3.7e-08 33.5 0.2 11 45 675 709 673 710 0.96 17 22 1.2e-10 2.6e-08 34.0 0.2 11 46 709 744 707 744 0.96 18 22 2.5e-05 0.0055 17.0 0.0 21 46 743 768 743 768 0.96 19 22 4.1e-11 9e-09 35.5 0.3 9 45 783 819 778 820 0.95 20 22 1.7e-10 3.7e-08 33.5 0.2 11 45 819 853 817 854 0.96 21 22 1.7e-10 3.7e-08 33.5 0.2 11 45 853 887 851 888 0.96 22 22 5.7e-22 1.3e-19 70.3 2.3 10 57 886 933 884 933 0.97
Sequence Information
- Coding Sequence
- ATGTCAGTAGTGCGACACGACAGCAGTTCGGCGCCGCGCGCGCACTCCATAGAGCAGATCCTGGCGCGACCCGACCCCGCGCCCCCCTCACACCTCGCCAGGCACAGAGATCGTGTAGATCTGAAGAGTGAAAGTAGCGCGCACTCGGACTCAGatcacgagcacgagcacgagcacgagcacgagcacgagcgtgatcacgagcacgagcacgagtTAGAGCACGAGCAGGAGGGCGCACATGAGCACCTCGAGCACGGCGACCAGCTGGAGCCGCTCGACGCCGGCCGACCGCGCAAGCAGGTGCGCCGCAGCCGCACGACGTTCACCACGTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTCAGtacctcactcactcactcactcactcactcactcactcacacgTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTCAGtacctcactcactcactcactcactcactcactcactcacacgTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTCAGtacctcactcactcactcactcactcactcactcactcacacgTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTCAGtacctcactcactcactcactcactcactcactcactcacacgTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTCAGtacctcactcactcactcactcactcactcactcactcacacgTATCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGCTGCACGAGCTGGAGCGCGCGTTCGACAAGACGCAGTACCCCGACGTGTTCACCCGCGAGGAGCTCGCGCTGCGCCTGGACCTCAGCGAGGCGAGGGTGCAGGTGTGGTTCCAGAACCGACGCGCGAAGTGGCGCAAGAGAGAGAAGGCGCTCGGCCGCGACCACGCGCCCTTCCTACACCACGACCACGCAGTGGGGGAGTGGAGCCCGGCGGGCGCGGAGTGGTGGAGCGTGGGGGCGGGCGCGCTGCTGCCGGCGCCGCTGTGGCCCGACGAGCCCGCCGCCGCCTTCCGCGCGCTGCTGCACCGCTCACCCTCAGCGTCTCCCCACAGGTACGTGCTGGCGCTGCCCCCCCCCGCGCTGCTGGCGGGGCGCGCCTCGCCCCCCCGCGCGCCCACGCCTCCCAGCCCGGCTCATGCGCACCCGCTCGCGCTCGCGGCGCGCGCCCCCGGTGCGAGCGGGACGGACACACTGCGGCTGCGGCACGAGGCGCTGCTGCAGGAGCGCGGCCAGGTACACACGTAG
- Protein Sequence
- MSVVRHDSSSAPRAHSIEQILARPDPAPPSHLARHRDRVDLKSESSAHSDSDHEHEHEHEHEHERDHEHEHELEHEQEGAHEHLEHGDQLEPLDAGRPRKQVRRSRTTFTTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQVSTSLTHSLTHSLTHTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQVSTSLTHSLTHSLTHTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQVSTSLTHSLTHSLTHTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQVSTSLTHSLTHSLTHTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQTQYPDVFTREELALRLDLSEARVQVSTSLTHSLTHSLTHTYQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQLHELERAFDKTQYPDVFTREELALRLDLSEARVQVWFQNRRAKWRKREKALGRDHAPFLHHDHAVGEWSPAGAEWWSVGAGALLPAPLWPDEPAAAFRALLHRSPSASPHRYVLALPPPALLAGRASPPRAPTPPSPAHAHPLALAARAPGASGTDTLRLRHEALLQERGQVHT
Similar Transcription Factors
Sequence clustering based on sequence similarity using MMseqs2
- 100% Identity
- -
- 90% Identity
- -
- 80% Identity
- -