Basic Information

Gene Symbol
-
Assembly
GCA_000001215.4
Location
JANZWZ010000289.1:312270-318905[+]

Transcription Factor Domain

TF Family
THAP
Domain
THAP domain
PFAM
PF05485
TF Group
Zinc-Coordinating Group
Description
The THAP domain is a putative DNA-binding domain (DBD) and probably also binds a zinc ion. It features the conserved C2CH architecture (consensus sequence: Cys - 2-4 residues - Cys - 35-50 residues - Cys - 2 residues - His). Other universal features include the location of the domain at the N-termini of proteins, its size of about 90 residues, a C-terminal AVPTIF box and several other conserved residues. Orthologues of the human THAP domain have been identified in other vertebrates and probably worms and flies, but not in other eukaryotes or any prokaryotes [1].
Hmmscan Out
# of c-Evalue i-Evalue score bias hmm coord from hmm coord to ali coord from ali coord to env coord from env coord to acc
1 19 6.4e-08 0.00016 22.7 0.4 25 86 25 75 18 76 0.83
2 19 8.8e-05 0.22 12.6 0.5 1 87 103 154 103 154 0.70
3 19 4.9e-16 1.2e-12 48.7 0.2 1 87 176 248 176 248 0.85
4 19 2.3e-16 5.6e-13 49.8 5.5 1 87 329 399 329 399 0.82
5 19 3.1e-08 7.7e-05 23.7 0.1 28 86 430 477 411 478 0.80
6 19 0.00019 0.48 11.5 1.0 36 87 517 561 495 561 0.70
7 19 7.2e-11 1.8e-07 32.1 0.1 1 62 601 657 601 673 0.79
8 19 9 2.2e+04 -3.9 0.3 1 11 699 709 699 715 0.77
9 19 1.6 4.1e+03 -1.1 0.0 1 12 772 783 772 809 0.62
10 19 1.8e-10 4.3e-07 30.9 0.1 18 86 822 879 813 880 0.83
11 19 2.1e-12 5.2e-09 37.0 2.7 1 85 954 1022 954 1024 0.82
12 19 2.7e-07 0.00066 20.7 0.1 1 87 1047 1116 1047 1116 0.70
13 19 1.8e-08 4.4e-05 24.5 0.1 17 81 1213 1262 1205 1275 0.71
14 19 1.8e-11 4.5e-08 34.1 0.4 1 86 1348 1414 1348 1415 0.78
15 19 1.4e-07 0.00035 21.6 0.0 1 59 1430 1478 1430 1494 0.78
16 19 2.6e-13 6.3e-10 40.0 0.6 1 87 1507 1577 1507 1577 0.83
17 19 2.3e-12 5.8e-09 36.9 0.9 1 87 1633 1703 1633 1703 0.83
18 19 1.2e-11 2.9e-08 34.6 0.2 1 86 1738 1809 1738 1810 0.81
19 19 5.9e-05 0.15 13.2 0.0 1 40 1820 1856 1820 1865 0.85

Sequence Information

Coding Sequence
ATGCACGCAGGAGCGATCGCCGGGCCCCAGGCGCACTCCTCAACGCTGGACGACTCCGAGGACGCCCTGTGCTGCGACGAGAAGTACCTCAATCAGTGGCTGCACAACCTCAAGATGTTTCACATACCCGCCGCCAGCTACGCAAATTTCCGCATCTGTAGCATGCATTTTCCGAAGCGCTGCATTAACCGCTACTCTCTGTGCTACTGGGCCGTTCCCACGTTCAACCTGGGCCATGACGACGTGGCCAATTTATACCAAAACAGGGAACTCACTAACACCTTCACCACCGGCGAGGTAGCGCGCTGCAGCATGCCTCACTGTACAAGTCAGCGGGGTGAGAGCAACTTGAAGTTCTACAACTTTCCCAAGGACATCAAAAGCTTGATCAAGTGTCGCCACTTCGAGGAGCGCTGTATTGGCAAGTTCCGTCTGAAGCCGTGGGCGGTGCCTACTTTACACCTAGGTGCCCAATATGGCAAGATCCACGATAACCCAAAGAATTTGTACGTTGAAGAAAAACGCTGCTGCCTCAACTTCTGCCGCCGGAGCCGATCCTCTGACTTCAATATGTCGCTATATCGATTTCCCAGAGATGAAGTTCTCCTGCGACGCTGGTGCTACAATCTCCGCTTGGATCCCGGAGTGTATCGTGGGAAAAATCACAAAATATGCAGCGCCCACTTTATCAAAGAGGCGTTGGGTCTGCGCAAACTATCACCAGGGGCCGTTCCGACGCTTCATCTGGGTCACACTGATACCTTCAACATCTACGAGAACGAATTGTGGCCACCGCCAACGGCACCCAACAGTCACAGCAGTGGCCTCCAGCACCAGACGCAACAACATTCCTCGCAACACTCACTGCAGCAGCAATTGCATAGCAAATCGTACCACCGGCAATCGGCGGCCTCCACGTCCTCCTCCGCCAGTTCGGGTGCCAGTGGATCTTCTGCAATGAATGCCAGCGACAGCATGGACGTATGTTGTGTGCCCAGTTGTGAGAGCAAGAGGCACAATAATGAGAACATTACATTCCACACCATACCACGTCGACCGGAGCAGATGCGCAAGTGGTGCCATAATCTGAAAATACCCGAAGAAAAGATGCACAAGGGTATGAGGATCTGCAGCCTGCACTTCGAGCCCTATTGCATCGGCGGTTGCATGCGTCCATTTGCGGTGCCTACGCTTAACTTGGGTCACGATGACGACGATATTCATAGAAATCCGGATGTGATCAAGAAGTTAAACATCCGGGAAACGTGCTGCGTCGCCGTCAATGTGTCCTTATTGACCAAGTGGTGTGGCAATCTGCAGCGTCCTGTTCCGGATGGAAGTAAACTTTTCAACGACGCCATTTGTGAAGTGCACTTTGAGGAACGATGTCTGCGCAACAAAAGACTAGAGAAGTGGGCAGTGCCGACACTATCGTTAGGCCACGAAAACATCCCATACCCGCTGCCAACGCCGGAACAGGTTACGGAGTTTTACTCTCGACCCACTGCGCCCAATAATGGCGAGGAACAGGGAGAGTGCTGTGTGGAGACGTGCAAGAGAAATCCAAGTGTGGACGACATCAAGCTTTATCGGCCGCCGGAGAAGCTTCCGATCTGTAATCTTCACTTTGAGGCACACTGCATCGGCAAGCGGATGAGACCTTGGGCTATTCCAACACTGAATCTGGCAGGCACAATAGAGAATCTCTACGAGAATCCGGAGCATTCGATGCTGTACAAGCGGCGGACCCACATGAAAGCCAAGCAATCGGCTTCCGTGAAGCCCACTTGGGTGCCCAGGTGCTGTCTTCCGCATTGCCGCAAAGTTCGGGCTCTCCACAACGTTCAGCTTTATCGCTTCCCAAAGCTCAATCGCTCCACTCTGGCTAAGTGGGCGCACAATCTGCAGGTTCCTATGGTTGGCAGTGCCCAGCGTCGTCTATGCTCGGCTCATTTCGAGCCGCACGTGCTCAGCAAAGAAGTGCCCGGTGCCGCTGGCGGTGCCCACATTGGACTTGAATGCGCCGCCCGGCTTGAAGATTTACCAGAATCCAGCCAAGCTCTCAAGGCTAGCAAGCTGTGTCTGCAGCGCGTGTGCATTGTCGAGAGTTGTCGCAAGACGCGGGCGCAGGGCGTCAGCTCTTCCGACTGCCACATAGTCCAACGCAGCTGCGCAGTGGATGCACAACATCAAAACGCGTCCCAGAGCGGCGATGAGGGCCCAATACCCGCTGGCGCCATTCCCACCCTGGAACTGGGTCATGACGACGAGGACATCTATCCCAACGAAGCGCAGGCCTTTGCGGACGAGCACTGCGTGGTGGAGGGCTGCGAGGCATCAAGGAACAGCCTGACGCAGATTGCATCGGGCCTCAAGCACTTATACAAGTGGGCTATTCCCACCGAGGAACTGGGTCACGACGACGCTGACATCGAGCTAGTGCTAAATCCCAAGCCGGAGGACAGTCAGATGAACAGTTTTCCCAAGGACGCGAATCTCTTCGAGCGATGGAAACACAACTTGCGGTTGGAACACCTCAGCTTCCACGAACGCGATCGGTACAAGATATGCAACTCTCACTTTGAGGATATATGTATTGGGAAGACGCGGCTTAACATAGGTTCGATTCCGACTCTAGAATTGGGTCACGACGAGACAGACGATCTGTTCCAGGTAAATCCGGCGGAGCTGCAGAGCAACACTTTTCGGACGACGGCGGCGAGATTACACGACGAGTCGGGCGGTATAATCATCAAGCAGGAGTTTTCCGAGTCGGAGGACGTCAAAACGGACGTGTCTGATGCCAAAGATTTCAATACGAGACAGGTTAAGCTCAGAAAGACTATGTCCGATCTGAAGTGTTGTGTGCGCAGTTGTGGGCGCAGTCGACTGGAGCACGGAGCGCGCCTCTTTCCATTTCCACCGGCAAGCAGCAGCACCTGGAAGTGGCGCCATAACCTGCGCCTGGAACCCGACGAGGTGGACCGATCGACACGGATTTGCAGTGCGCACTTCAATCGGCGCTGCATTGATGGCAAGCAGCTGAGAAGCTGGGCAATGCCCACGCAACAACTGGGCCACCAGGAGCAGCCGATCTACGAGAATCCGAAAAACATACCAGGATTCTTTACGCCCACCTGTGCTCTGAGTCATTGCCGCAAGCGTAGGAGCATTGACAACGATCTCCGCACCTATCGATATCCGAGGGTGAGGATCTTCTGGAGAAATGGCGGGCGAATCTGCGTCTGGCGCCGGATCAGTGTCGCGGCTGGGATATGTGCTGACCATTTTGAGTCACAGGTGCGTGGAAAGTTGAAGCTGAAAACGGGAGCGGTGCCTACTCTAAATCTGGGCCATGATGAGGGCTTAATATACGACAATGAGGCTATAAAGCTCGGAGATGACACGACTGAAACCCAAAGAGAGCTGATTGATGAAGAGGAAGAAGAACTAGAGGCTGAGGAGGAGCCCCATGAGCACGATATGTACGATGAAGACGAGAAGGACGGCCACTATTTCGATCCTCTCGAACTGGAGGATGAGCGCGACGAAGATGAGGACTTGGACGAGGCGGAGCACTTTCATCCGGACAACCCACCCACTCCCCAACCATCCCTCTGCGTCGCGAAAAGCCCGCGAATAATCACCTTTGGCTTTCCCAAGGATCGCCAACTGCTGCTCAAGTGGTGCGCCAATCTACACCTGAATCCGGATGACTGCATCGGCCGCGTTTGCATAGAGCACTTTCAGCCGGAGGTATTGGGAACCCGAAAGCTGAAGCAAAATGCGGAGCAATTGCAGCCACAGCACTCGGTTTTTCGGCTTTGGAGCCTAAAACACTGCCGCAAAGAGGAAACTGACGGAGCCGCCGGACATCCGCCAAAACAAGTGGAGTGCGGAAGTGCGGAAGATGCAGAGATTGAGGATGGAAATGAAGATGAAGATAGGGAGGGAGATCAAGCTGGAGGTGCAGACGGAGAGGAAATGAAGTCCAAGGAAAAGACTCCAACGACGAGTCTCGGAAAGATTAAGTTGGAAATATGTTGCATCAACTCCTGTGCGAATGACGACGTTAACCAACTACTTCAGCTGCCTGAGGATCAAAATCTCTTAAGAAAGTGGCAGCATAACCTAAAGTTATCCGTGGACACGGACTTCAAGGAAATCCAAGTGTGTCTAAAGCACTTTGAGGAGCAAGTGGTGCAAAACGGAAAGCCCTTGGAGCAGGCTGTACCCACCTTACAGCTAGATCAAAACAGTTGGAACATCTACAGAAACAGCGGGAATTGCCTGTTTCCAGAGTGCAGTAATTCTTCATCGGACCAGTTAAGCTTTGTTGATTTACCTGGAAATGCGGTCATAAGAGATGCCTGGATGAGTCACCTCAATTTGCCACCCAGCAGTGAGGGTCTTCTTTGTAGTGAGCACTTTATGCAACTCTTTGAACAGGTGGAGTACCCCAAGGTATTGGCTGCACAAGATTTGGAAGACTTGCAGTGGATTGTTGACGAACTTAGATGCGCTGTTCCCAGTTGTTCATCCAAATCTGATGGGGATCTTCAGCTTATCCCTTTTCCGAAAAAGGATGCTACCCTTTTGAAGTGGCTGCAAAACACAAAGATATCTTACGATCATTTAAAGCACAAAAGCTATCGCATATGTGTTCTTCATTTCGAGCCGACTTGCCTAGAGGCAAATTTTCCGAAAGCTTGGGCTATACCCACCTTGCATTTAAACCACGATGACGAGCTTCATTTGAATCTCAGGCCTGAATCTCGCAGTGGTACACCAAACAGCAACTCCAGACTAACTCCATTGAGAATTAAAACAGATCTGACCTCCTTGGGAAGTCCATGCTCGAGTGCAAGTCCTAGTCCTCGAGGCAGGATCAGGATATGTTGTATTTCCACATGTGGACAGATTGGAAGTAGTCAAGTTCGACTCTACCGCTTTCCCACCGAAGAGCAGGCCCTACTCCGGTGGCTGGTGAACACGCAGCAGCAACCTCGCATTGTGGACCCTGCAGAGCTATATGTGTGCCAATCTCACTTTGAACCAGACGCCATTTGCAAAAAACAACTTCGTTGCTGGGCAGAACCCACCTTAAACCTGGGCCACGACGGGTTTGTTATCCCCAATGCCAAGCACAATGGAAACATTGCTGGGGGCCAGGATACTGAGGAGGCGATGAGGCTTATCCGGGAGCGCTATTGCTCCGTACTGACTTGTTTCCAGGCTGAAGCTAGCGGTGTAAGACTTTATGAGTATCCCAAGGATATGCCAACTATACGAAAGTGGGCAGCCGCGTGTAGACATCGCTCCATGCAGGCCAGCAGCCATGGATTCAAGGTATGCCAGTCTCACTTCGCACCGGAATGCTTCGAGCCGGACACATTAAATTTGATTGAGGGATCCGTTCCCACTCTGGAGTTAAGTAGAGGCGACATCGAAAGACACTGCCTAGTGTCTGGATGTGAAAAGGATGCATCTGAAGGACGTCTGCGCTATTACAAGGTGCCAAAAACCACTGCTCAACTGAATGCTTGGAGCAACAACCTGAAGATCAGTTGCCAGGACCTCGGATTGGGGAGCAGCTCATCTGTGAGCGTCACTTTGAGCCCTTTTTGCTTCGGTGCCCACAAAGGGATTACGCCCTGGCGCACTGCCGACTCTCATGCTAGGCCACGACGAAGAGGTGGAGATGTTACCGAACCCAGAAATTCTCTGGCAGAAAAAAGCCGAGGTTTGCTGTGCCACTGCATGTGGTCGAATATGGCAGCCTGGAGACCCTAA
Protein Sequence
MHAGAIAGPQAHSSTLDDSEDALCCDEKYLNQWLHNLKMFHIPAASYANFRICSMHFPKRCINRYSLCYWAVPTFNLGHDDVANLYQNRELTNTFTTGEVARCSMPHCTSQRGESNLKFYNFPKDIKSLIKCRHFEERCIGKFRLKPWAVPTLHLGAQYGKIHDNPKNLYVEEKRCCLNFCRRSRSSDFNMSLYRFPRDEVLLRRWCYNLRLDPGVYRGKNHKICSAHFIKEALGLRKLSPGAVPTLHLGHTDTFNIYENELWPPPTAPNSHSSGLQHQTQQHSSQHSLQQQLHSKSYHRQSAASTSSSASSGASGSSAMNASDSMDVCCVPSCESKRHNNENITFHTIPRRPEQMRKWCHNLKIPEEKMHKGMRICSLHFEPYCIGGCMRPFAVPTLNLGHDDDDIHRNPDVIKKLNIRETCCVAVNVSLLTKWCGNLQRPVPDGSKLFNDAICEVHFEERCLRNKRLEKWAVPTLSLGHENIPYPLPTPEQVTEFYSRPTAPNNGEEQGECCVETCKRNPSVDDIKLYRPPEKLPICNLHFEAHCIGKRMRPWAIPTLNLAGTIENLYENPEHSMLYKRRTHMKAKQSASVKPTWVPRCCLPHCRKVRALHNVQLYRFPKLNRSTLAKWAHNLQVPMVGSAQRRLCSAHFEPHVLSKEVPGAAGGAHIGLECAARLEDLPESSQALKASKLCLQRVCIVESCRKTRAQGVSSSDCHIVQRSCAVDAQHQNASQSGDEGPIPAGAIPTLELGHDDEDIYPNEAQAFADEHCVVEGCEASRNSLTQIASGLKHLYKWAIPTEELGHDDADIELVLNPKPEDSQMNSFPKDANLFERWKHNLRLEHLSFHERDRYKICNSHFEDICIGKTRLNIGSIPTLELGHDETDDLFQVNPAELQSNTFRTTAARLHDESGGIIIKQEFSESEDVKTDVSDAKDFNTRQVKLRKTMSDLKCCVRSCGRSRLEHGARLFPFPPASSSTWKWRHNLRLEPDEVDRSTRICSAHFNRRCIDGKQLRSWAMPTQQLGHQEQPIYENPKNIPGFFTPTCALSHCRKRRSIDNDLRTYRYPRVRIFWRNGGRICVWRRISVAAGICADHFESQVRGKLKLKTGAVPTLNLGHDEGLIYDNEAIKLGDDTTETQRELIDEEEEELEAEEEPHEHDMYDEDEKDGHYFDPLELEDERDEDEDLDEAEHFHPDNPPTPQPSLCVAKSPRIITFGFPKDRQLLLKWCANLHLNPDDCIGRVCIEHFQPEVLGTRKLKQNAEQLQPQHSVFRLWSLKHCRKEETDGAAGHPPKQVECGSAEDAEIEDGNEDEDREGDQAGGADGEEMKSKEKTPTTSLGKIKLEICCINSCANDDVNQLLQLPEDQNLLRKWQHNLKLSVDTDFKEIQVCLKHFEEQVVQNGKPLEQAVPTLQLDQNSWNIYRNSGNCLFPECSNSSSDQLSFVDLPGNAVIRDAWMSHLNLPPSSEGLLCSEHFMQLFEQVEYPKVLAAQDLEDLQWIVDELRCAVPSCSSKSDGDLQLIPFPKKDATLLKWLQNTKISYDHLKHKSYRICVLHFEPTCLEANFPKAWAIPTLHLNHDDELHLNLRPESRSGTPNSNSRLTPLRIKTDLTSLGSPCSSASPSPRGRIRICCISTCGQIGSSQVRLYRFPTEEQALLRWLVNTQQQPRIVDPAELYVCQSHFEPDAICKKQLRCWAEPTLNLGHDGFVIPNAKHNGNIAGGQDTEEAMRLIRERYCSVLTCFQAEASGVRLYEYPKDMPTIRKWAAACRHRSMQASSHGFKVCQSHFAPECFEPDTLNLIEGSVPTLELSRGDIERHCLVSGCEKDASEGRLRYYKVPKTTAQLNAWSNNLKISCQDLGLGSSSSVSVTLSPFCFGAHKGITPWRTADSHARPRRRGGDVTEPRNSLAEKSRGLLCHCMWSNMAAWRP

Similar Transcription Factors

Sequence clustering based on sequence similarity using MMseqs2

100% Identity
-
90% Identity
-
80% Identity
-