Basic Information

Insect
Dorcus hopei
Gene Symbol
topi
Assembly
GCA_033060865.1
Location
CM065425.1:77115711-77145315[+]

Transcription Factor Domain

TF Family
zf-C2H2
Domain
zf-C2H2 domain
PFAM
PF00096
TF Group
Zinc-Coordinating Group
Description
The C2H2 zinc finger is the classical zinc finger domain. The two conserved cysteines and histidines co-ordinate a zinc ion. The following pattern describes the zinc finger. #-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C] Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # are those that are important for the stable fold of the zinc finger. The final position can be either his or cys. The C2H2 zinc finger is composed of two short beta strands followed by an alpha helix. The amino terminal part of the helix binds the major groove in DNA binding zinc fingers. The accepted consensus binding sequence for Sp1 is usually defined by the asymmetric hexanucleotide core GGGCGG but this sequence does not include, among others, the GAG (=CTC) repeat that constitutes a high-affinity site for Sp1 binding to the wt1 promoter [1].
Hmmscan Out
# of c-Evalue i-Evalue score bias hmm coord from hmm coord to ali coord from ali coord to env coord from env coord to acc
1 35 0.0028 0.31 13.1 2.8 1 22 79 100 79 100 0.95
2 35 0.00023 0.025 16.5 0.3 1 23 107 130 107 130 0.96
3 35 0.00015 0.016 17.1 3.0 1 23 136 158 136 158 0.99
4 35 0.0057 0.64 12.1 4.5 1 23 164 187 164 187 0.98
5 35 5.8 6.4e+02 2.6 2.7 2 23 201 223 200 223 0.91
6 35 0.00048 0.053 15.5 2.5 1 23 229 251 229 251 0.99
7 35 0.28 32 6.7 0.9 3 23 259 279 257 279 0.96
8 35 5.8 6.4e+02 2.6 1.6 1 20 285 304 285 307 0.93
9 35 3.5 3.9e+02 3.3 0.8 1 23 375 398 375 398 0.83
10 35 0.26 29 6.9 0.0 1 23 411 433 411 433 0.96
11 35 0.0082 0.91 11.6 1.2 1 23 437 459 437 459 0.97
12 35 0.33 37 6.5 0.3 1 23 465 487 465 487 0.89
13 35 0.019 2.1 10.5 0.9 1 23 493 517 493 517 0.96
14 35 0.00045 0.05 15.6 2.2 1 23 523 546 523 546 0.96
15 35 0.00063 0.07 15.1 2.6 1 23 552 574 552 574 0.98
16 35 0.08 8.9 8.5 1.7 1 23 579 598 579 598 0.77
17 35 5.1e-05 0.0057 18.5 2.2 1 23 604 626 604 626 0.98
18 35 4.8e-05 0.0053 18.6 4.2 1 21 632 652 632 654 0.96
19 35 0.57 64 5.8 4.7 1 23 682 704 682 704 0.96
20 35 0.0027 0.3 13.1 1.1 1 23 710 732 710 732 0.98
21 35 0.00025 0.028 16.4 0.7 1 23 738 760 738 760 0.99
22 35 0.01 1.1 11.3 2.5 1 23 766 788 766 788 0.98
23 35 4.4e-05 0.0049 18.7 4.9 1 23 794 816 794 816 0.98
24 35 7.9e-06 0.00087 21.1 0.7 1 23 822 844 822 844 0.99
25 35 0.0059 0.65 12.1 1.7 1 23 848 870 848 870 0.98
26 35 0.085 9.4 8.4 0.2 2 23 906 927 905 927 0.96
27 35 1e-05 0.0012 20.7 2.4 2 23 956 977 955 977 0.97
28 35 0.0038 0.42 12.6 5.1 1 23 983 1005 983 1005 0.98
29 35 3.6e-07 4e-05 25.3 1.7 1 23 1011 1033 1011 1033 0.99
30 35 0.67 74 5.6 0.6 2 23 1040 1062 1039 1062 0.90
31 35 0.00026 0.029 16.3 1.9 1 23 1067 1089 1067 1089 0.97
32 35 0.0018 0.19 13.7 0.8 1 23 1099 1122 1099 1122 0.97
33 35 0.00014 0.016 17.1 0.1 1 23 1130 1152 1130 1152 0.97
34 35 9.1e-07 0.0001 24.0 0.5 2 23 1159 1180 1158 1180 0.98
35 35 1.9e-05 0.0021 19.9 1.0 1 23 1186 1208 1186 1208 0.98

Sequence Information

Coding Sequence
ATGACGGGTCTGGTCGACTTGCCGGTTATATTGTATTTCACCAGCAGTACAATATCGCCGGGGAGCGGTCGCAGTGAAAAGCACTGgacATTAAATGCCCTCTCTTGGTGCTgggatttaataaaaaaagaagtggAAACACGAAGCCCCGATTCAGGAACCGACGTACCGTACGATTTCCACCTCAAAGTGAAAGCAGAGGCGGATTTGGAACCTGAAACGCTCGGGGAGTCCTTTCGGTGCAAAAACTGCAAACGAGTCTTCTTCTCTTCGGCGAGCTTGAAGGAGCACGTGTGCACACGCACCGGCGAAAAACCATACCAATGCAATCGTTGCGATTTTAGATTTAGTTCTCTTGCGGAACTGAGCGAACATATCAAGGCCGTCCACGCCGCCGAAAAGGTTTACAGATGCAGCGTTTGCCAGCACGATTTTTGCAGCCCCGAGGGTCTAAAAGTGCACATCCGTAGACACACAGGCGAAAAACCGTATAAGTGCACGCTTTGTAATTACTGCAGTGCCCATTCCAGCGGGTTAAAAGGGCACATGCTCAAGTACCATCCCGACGAAATTCCGCCGAAAGTGCGCGGCACGTCTTTAAACTGCAAACATTGTAGTTTCATCGGCAGGACAACGCAAGAATTCAAAAAGCATTACACCGAAGTACATCCGGGCCAGAAGCCTTACAAGTGTACATTCTGTGATTACCGTAGTGTCCTACGTGACACTCTGAGGAAACATATGAATACGCATGCGAACGAAACTATTTACGGATGCAGCGCGTGCGATTTCAAATGCATCTCACCCGAAGAGCTAAGCAAACACAACCAAACGCACGTATTAGAGAAACCGTTTCAATGTGGAATCTGCATGTACAAGTGTGCAACATACAGTAGATTACAAGAGCACACGTGGATGCATAGCGGCATGAAACTTTTTAAGTGCGACTTATGTAGatttaaaaaaatcagcaaAAAGCGACCGAAAAACAAAAGCGAAGGCGACGGCATCAACTTGCCCCTGGAAGTGATCATAAAAGTCGAAGCCGATAAGGAGGCGGCCAATTCGAGTTTGCCCAGAAAAGGCAAAATCAAACGGAAGAAGCAAGACGCAATAAAAACGTATATCTGCACGATATGTTCTGTGAGCACGAGGATGACGAAGAGAGAAATGAATGCGCACTATAGAACTCACAGGTGTAAGAAGTCGCACGAGAAACCGAACAGGATGTTCGTGTGTGACATTTGCTTGGAGAACTTACCCTCCGCCGAAGCACTAGAGGATCACATAGAAGAGCACGAGGACCAATACAAGTGCAAGATCTGCCAGAGGGGGTTTAAAAAGATACTGGATTACACGTACCACATACAAGCCCACAGTGACGATAAGCTTTACAGGTGTCCGATATGTGACTACACCATGTCAAGCCGACGTAGAGTTAGGCCTCACATTGCAACGCACAACTTCTTCAAAAAGTACGAGTGCCCCGTCGAGGGCTGCAAGAAGGGCTTCACGGTGCTAACCCACTACGAGGAGCACAAGAACTTCCACACCGGCGAAAAACCGTTCGTCTGCGAAATCTGCGGGACGTCTTTCATGTACTCGCACTACCTCACCATACACAGGACCAAGCAGCACAAACCGGAGAGGCAGTACAAGTGTTCGGTTTGTTCGGAGACCTTCACCGGTTTTAAGAACCTCAGGACGCACAAGCTGCGCCACGAACGGTCGGACCACATCTGCAAGATTTGCGGCAAAAGCTATAAGCAGCTGAACTCGCACGAGCTGACGCACACCGACCATAAGCCGCACAAGTGCAGCTATTGCGAGAAAACGTTCAGGCTGAAGGGGAACTTGGTTGATCACGAACGGATTCACACCGGTTGCAGGCCGTACAGTTGTACGGTGTGCGCGAAAAATTTTACGCAAAAGTCTAGCTTGAACAGGCACATGCGTTGTCATGTTAACGAGGAAATCCCCGAAGTAAAACCACAATCGCCCCTCCAGGAACTGCCCACGTCATCCCAGTCCAAAGACAAACCGTACAACTGCGATTCCTgcaattttaaaacgtacagGTCCGATCATTTTAAACTGCACTTAGTGAAACACGCAGGCGGAAAACCTTACGCTTGTAACTTGTGCGATTATAAGAGCAGATTCCGTTCGAATTTAGGTGTGCACTTGAAACGACACGCCGGCGAAAGGCCTTTTAAGTGCGGGATGTGCGATTACAAAGCCACCGAATCTGCGACGCTCAGGAAACACATGATGACACACTCCGACGAGAAACCCTTTCAATGCGGCACGTGCGAGTTTAGATGTAAGGACGCGGCGCGCTTGAAATCGCACATGTTAAGGCACACCACCGAAACGCCGTTCCAGTGCGAAatctgcgattacaaatgtaaAACATCTTCTTACTTAAAGAAGCACCTCTTGAAACACACCAGCGGAGAGCTGTTTAAGTGTGGGGTGTGCGACGCGAAATTCACAGAGCCCGCGCACTTGAAGCAGCACATGTTAACACACGGCGGGCGGTTTAAATGTAAGAAGTGTGATTTTAAAACTGCCGAATCTAGGTCGTTGAAGGAGCACGTGCTGACGCACGCCGGCGATCAGCCGTTCAAGTGCGCGATTTGCGCTTTTAAGGCGCACAAATTCAAAACGTTGAAGAACCACGTCAAGTGGAGGCACACGGGTAGGGAACTCATCAAGTGCGACCTCTGCGATCATAAGGTGGTTCGACCTAAGGATTTGAAGCCCCATATGGTAGTACATACCGGTGAAAAACCTTATCAATGTAATCTTTGCGATGGCAGGCAAGAGGATCTCGCCTCCGCAAAACCATCGAAAAACGTCAGTCAGTGTCACGTTTGTTCGAAAACTTTCGCCCGATGGCGCCTTCTAAAGCGGCACCTGGTGACACACACCAGCGAAAGGCCTTTCAAATGCGACTTGTGCTCGAAGCACTTCCGGCGCAACTTCGAGATGCTCGCGCACAGGAAATACCACAGCGACGCTAAACCCTTCAAGTGCGATGTGTGCGATAAGTCGTTTAAACGGAAGGGCGCCCTCGAGAACCACCGGAAGAGGCACATGAAGGATTACAAAGTTGTGTGCGAGGTTTGCCAGGTCGGCTTCTATTCGAACTGCGAGTACAAACTCCACCTCGGCTATAAGCACGGGCAGGGCAAGTTTATTTGCGAAATTTGCGCCAAGTCTTTCTACAATAAGTTCGGACTCAATCAGCACATGCACCTGCACCGGGAGTGTTACAAGGACGAAAGAAAGTACACGTGCGACATATGCAATAAGGGATTCTTGCAGCCGATTTATTTGAGGGCGCACCACAACAAAGTGCACAAGGACAGTTCGAGGAAACACTACGTTTGTGATTTGTGCGGGAAAGAGGTTTCTTCGAGCACTAGTTTGAGGGAACATCAGTTGCTACACGAGGGTGTTAAACCCAATAAGTGCGACGTGTGCGAGAAGAGTTTCGCGTCCAGGTCGAATCTCGTCGTGCACTTGCGGacgcacaccggtgagaagccataCGAATGTCGGGACTGTGGCAAGTCGTTTACTCAGAGGAACACTTTCGTCATACACGCGCGTCAACATTCCGGGGAACGTCCTTATATCTGTCAGTTGTGCCGTGAAGAGGTATCACTCACCATATTCAAGGGGATGAAACTAGGACGACCCTGGAGGAATGCTGAAGAACAGGACAAGGATCTACAAGGATGGGGATGCAAGCTAAAGATGTTACGAGGAGATATGAGATGCCACAGCTCTCTTACTTGTCCAAATAAATAG
Protein Sequence
MTGLVDLPVILYFTSSTISPGSGRSEKHWTLNALSWCWDLIKKEVETRSPDSGTDVPYDFHLKVKAEADLEPETLGESFRCKNCKRVFFSSASLKEHVCTRTGEKPYQCNRCDFRFSSLAELSEHIKAVHAAEKVYRCSVCQHDFCSPEGLKVHIRRHTGEKPYKCTLCNYCSAHSSGLKGHMLKYHPDEIPPKVRGTSLNCKHCSFIGRTTQEFKKHYTEVHPGQKPYKCTFCDYRSVLRDTLRKHMNTHANETIYGCSACDFKCISPEELSKHNQTHVLEKPFQCGICMYKCATYSRLQEHTWMHSGMKLFKCDLCRFKKISKKRPKNKSEGDGINLPLEVIIKVEADKEAANSSLPRKGKIKRKKQDAIKTYICTICSVSTRMTKREMNAHYRTHRCKKSHEKPNRMFVCDICLENLPSAEALEDHIEEHEDQYKCKICQRGFKKILDYTYHIQAHSDDKLYRCPICDYTMSSRRRVRPHIATHNFFKKYECPVEGCKKGFTVLTHYEEHKNFHTGEKPFVCEICGTSFMYSHYLTIHRTKQHKPERQYKCSVCSETFTGFKNLRTHKLRHERSDHICKICGKSYKQLNSHELTHTDHKPHKCSYCEKTFRLKGNLVDHERIHTGCRPYSCTVCAKNFTQKSSLNRHMRCHVNEEIPEVKPQSPLQELPTSSQSKDKPYNCDSCNFKTYRSDHFKLHLVKHAGGKPYACNLCDYKSRFRSNLGVHLKRHAGERPFKCGMCDYKATESATLRKHMMTHSDEKPFQCGTCEFRCKDAARLKSHMLRHTTETPFQCEICDYKCKTSSYLKKHLLKHTSGELFKCGVCDAKFTEPAHLKQHMLTHGGRFKCKKCDFKTAESRSLKEHVLTHAGDQPFKCAICAFKAHKFKTLKNHVKWRHTGRELIKCDLCDHKVVRPKDLKPHMVVHTGEKPYQCNLCDGRQEDLASAKPSKNVSQCHVCSKTFARWRLLKRHLVTHTSERPFKCDLCSKHFRRNFEMLAHRKYHSDAKPFKCDVCDKSFKRKGALENHRKRHMKDYKVVCEVCQVGFYSNCEYKLHLGYKHGQGKFICEICAKSFYNKFGLNQHMHLHRECYKDERKYTCDICNKGFLQPIYLRAHHNKVHKDSSRKHYVCDLCGKEVSSSTSLREHQLLHEGVKPNKCDVCEKSFASRSNLVVHLRTHTGEKPYECRDCGKSFTQRNTFVIHARQHSGERPYICQLCREEVSLTIFKGMKLGRPWRNAEEQDKDLQGWGCKLKMLRGDMRCHSSLTCPNK

Similar Transcription Factors

Sequence clustering based on sequence similarity using MMseqs2

100% Identity
-
90% Identity
-
80% Identity
-