Basic Information

Gene Symbol
-
Assembly
GCA_950005045.1
Location
OX465446.1:10960299-10963777[+]

Transcription Factor Domain

TF Family
Homeobox
Domain
Homeobox
PFAM
PF00046
TF Group
Helix-turn-helix
Description
This entry represents the homeodomain (HD), a protein domain of approximately 60 residues that usually binds DNA. It is encoded by the homeobox sequence [7, 6, 8], which was first identified in a number of Drosophila homeotic and segmentation proteins, but is now known to be well-conserved in many other animals, including vertebrates [1, 2], as well as plants [4], fungi [5] and some species of lower eukaryotes. Many members of this group are transcriptional regulators, some of which operate differential genetic programs along the anterior-posterior axis of animal bodies [3]. This domain folds into a globular structure with three α-helices connected by two short loops that harbour a hydrophobic core. The second and third form a helix-turn-helix (HTH) motif, which make intimate contacts with the DNA: while the first helix of this motif helps to stabilise the structure, the second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. One particularity of the HTH motif in some of these proteins arises from the stereo-chemical requirement for glycine in the turn which is needed to avoid steric interference of the β-carbon with the main chain: for cro and repressor proteins the glycine appears to be mandatory, while for many of the homeotic and other DNA-binding proteins the requirement is relaxed.
Hmmscan Out
# of c-Evalue i-Evalue score bias hmm coord from hmm coord to ali coord from ali coord to env coord from env coord to acc
1 27 1.4 6.4e+02 1.8 0.0 18 39 11 32 9 37 0.90
2 27 0.99 4.7e+02 2.2 0.0 17 39 47 69 46 74 0.90
3 27 0.99 4.7e+02 2.2 0.0 17 39 84 106 83 111 0.90
4 27 1.2 5.8e+02 1.9 0.0 17 39 121 143 120 146 0.91
5 27 0.99 4.7e+02 2.2 0.0 17 39 164 186 163 191 0.90
6 27 0.99 4.7e+02 2.2 0.0 17 39 201 223 200 228 0.90
7 27 0.99 4.7e+02 2.2 0.0 17 39 238 260 237 265 0.90
8 27 0.99 4.7e+02 2.2 0.0 17 39 275 297 274 302 0.90
9 27 0.99 4.7e+02 2.2 0.0 17 39 312 334 311 339 0.90
10 27 0.99 4.7e+02 2.2 0.0 17 39 349 371 348 376 0.90
11 27 0.99 4.7e+02 2.2 0.0 17 39 386 408 385 413 0.90
12 27 0.99 4.7e+02 2.2 0.0 17 39 423 445 422 450 0.90
13 27 0.99 4.7e+02 2.2 0.0 17 39 460 482 459 487 0.90
14 27 0.99 4.7e+02 2.2 0.0 17 39 497 519 496 524 0.90
15 27 0.99 4.7e+02 2.2 0.0 17 39 534 556 533 561 0.90
16 27 0.99 4.7e+02 2.2 0.0 17 39 571 593 570 598 0.90
17 27 0.99 4.7e+02 2.2 0.0 17 39 608 630 607 635 0.90
18 27 0.99 4.7e+02 2.2 0.0 17 39 645 667 644 672 0.90
19 27 0.99 4.7e+02 2.2 0.0 17 39 682 704 681 709 0.90
20 27 0.99 4.7e+02 2.2 0.0 17 39 719 741 718 746 0.90
21 27 0.99 4.7e+02 2.2 0.0 17 39 756 778 755 783 0.90
22 27 0.99 4.7e+02 2.2 0.0 17 39 793 815 792 820 0.90
23 27 0.99 4.7e+02 2.2 0.0 17 39 830 852 829 857 0.90
24 27 1.2 5.8e+02 1.9 0.0 17 39 867 889 866 892 0.91
25 27 0.99 4.7e+02 2.2 0.0 17 39 910 932 909 937 0.90
26 27 0.99 4.7e+02 2.2 0.0 17 39 947 969 946 974 0.90
27 27 1.2 5.8e+02 1.9 0.0 17 39 984 1006 983 1009 0.91

Sequence Information

Coding Sequence
ATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGTTGTACTTACCGAGCCAAGTAGTACTTACCGAGCCAAGTGACTGCATCTCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTACAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGTTGTACTTACCGAGCCAAGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAGTACTTACCGAGCCAAGTGACTGCATCCCTGACGCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGTTGTACTTACCGAGCCAAGTGACTGCATCCCTGACCCTCTGTATAGATCCGAGCACGATCTCAGCGTTCAGCATGTCGGGCAGCTTGCTGACCAGCTGCGACTGCTACAGGTAG
Protein Sequence
MSGSLLTSCDCYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQLYLPSQVVLTEPSDCISDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQLYLPSQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQVVLTEPSDCIPDALYRSEHDLSVQHVGQLADQLRLLQLYLPSQVTASLTLCIDPSTISAFSMSGSLLTSCDCYR

Similar Transcription Factors

Sequence clustering based on sequence similarity using MMseqs2

100% Identity
-
90% Identity
-
80% Identity
-