Basic Information

Gene Symbol
-
Assembly
GCA_947063395.1
Location
OX346722.1:339330-340829[-]

Transcription Factor Domain

TF Family
GCM
Domain
GCM domain
PFAM
PF03615
TF Group
Beta-Scaffold Factors
Description
GCM transcription factors are a family of proteins which contain a GCM motif. The GCM motif is a domain that has been identified in proteins belonging to a family of transcriptional regulators involved in fundamental developmental processes which comprise Drosophila melanogaster GCM and its mammalian homologues [PMID: 8962155, PMID: 9114061, PMID: 9580683, PMID: 10671510]. IN GCM transcription factors the N-terminal moiety contains a DNA-binding domain of 150 residues. Sequence conservation is highest in this GCM domain. In contrast, the C-terminal moiety contains one or two transactivating regions and is only poorly conserved.The GCM motif has been shown to be a DNA binding domain that recognises preferentially the nonpalindromic octamer 5'-ATGCGGGT-3' [PMID: 8962155, PMID: 9114061, PMID: 9580683]. The GCM motif contains many conserved basic amino acid residues, seven cysteine residues, and four histidine residues [PMID: 8962155]. The conserved cysteines are involved in shaping the overall conformation of the domain, in the process of DNA binding and in the redox regulation of DNA binding [PMID: 9580683]. The GCM domain as a new class of Zn-containing DNA-binding domain with no similarity to any other DNA-binding domain [PMID: 12682016]. The GCM domain consists of a large and a small domain tethered together by one of the two Zn ions present in the structure. The large and the small domains comprise five- and three-stranded beta-sheets, respectively, with three small helical segments packed against the same side of the two beta-sheets. The GCM domain exercises a novel mode of sequence-specific DNA recognition, where the five-stranded beta-pleated sheet inserts into the major groove of the DNA. Residues protruding from the edge strand of the beta-pleated sheet and the following loop and strand contact the bases and backbone of both DNA strands, providing specificity for its DNA target site.
Hmmscan Out
# of c-Evalue i-Evalue score bias hmm coord from hmm coord to ali coord from ali coord to env coord from env coord to acc
1 13 0.0081 1.2e+02 3.3 0.1 62 79 15 32 9 36 0.80
2 13 0.0031 46 4.6 0.5 51 77 41 66 34 71 0.63
3 13 0.0041 60 4.2 0.5 59 79 66 86 56 90 0.69
4 13 0.0023 33 5.1 0.8 51 78 77 103 68 107 0.64
5 13 0.0034 50 4.5 0.3 58 78 101 121 95 125 0.71
6 13 0.0027 39 4.9 0.3 52 78 114 139 110 143 0.72
7 13 0.0032 47 4.6 0.4 52 78 132 157 129 161 0.72
8 13 0.0027 39 4.8 0.3 52 78 150 175 146 179 0.72
9 13 0.0024 36 5.0 0.3 52 79 168 194 164 199 0.73
10 13 0.0029 42 4.7 0.3 52 78 186 211 182 214 0.72
11 13 0.002 29 5.3 0.5 51 78 203 229 195 233 0.65
12 13 0.0027 39 4.8 0.4 52 78 222 247 218 251 0.72
13 13 0.0023 33 5.1 0.3 52 79 240 266 236 274 0.75

Sequence Information

Coding Sequence
ATGTCATTCGGAGCTCTCTTTTACTATAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGTGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGATAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGTTCACAGCAGCGCGTGCGGGTGGTGCACTTGCGGCCCGTGGTGTGCGACAGTAG
Protein Sequence
MSFGALFYYSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQCVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVVCDSSSQQRVRVVHLRPVFTAARAGGALAARGVRQ

Similar Transcription Factors

Sequence clustering based on sequence similarity using MMseqs2

100% Identity
-
90% Identity
-
80% Identity
-