Identification, Structural Analysis, and Expression Profile of Genes Related to Starch Metabolism in Cassava ( Manihot esculenta Crantz)

Starch metabolism is known to be an important pathway in the growth and development of plants. This study was conducted to investigate the genome-wide identification and structural analysis of genes encoding uridine diphosphate glucose pyrophosphorylase (UGPase), a key enzyme in starch synthesis in cassava, and to analyze the expression profiles of these genes based on publicly available RNA-seq data. A total of 11 members were found in the UGPase gene family ( MeUGP ) in cassava. Ten of the MeUGP genes were successfully mapped onto the chromosomes of the current cassava genome assembly. Based on their nucleotide sequences, the lengths of the genomic DNA sequences of the MeUGP genes ranged from 3,200 to 11,601bp, while the size of the coding sequence (CDS) varied from 831 to 3,654bp. According to the recent RNA-seq data, we found that a majority of the MeUGP genes were expressed in at least 1 tissue under normal conditions. Interestingly, expressed in the shoot apical meristem, while MeUGP10 was more specific in the root apical meristem. The expression profiles of these MeUGP genes should be carried out in various conditions in further studies.


Introduction
Cassava (Manihot esculenta Crantz) is considered to be a major multifunctional crop in Vietnam. Many parts of the cassava plant can be used as a staple food for humans, animal feed, and raw materials for industrial production (Ceballos et al., 2004;Cutting, 1978). Among them, starch, a major storage form of glucose, is considered to serve as an important resource in providing energy for various biological processes during the growth and development of cassava plants (Li et al., 2016).
The basic pathway of starch metabolism begins with CO2 fixation, followed by transitory starch degradations, sucrose synthesis, and starch synthesis in the storage organs in the plant (Saithong et al., 2013). A number of enzymes involved in starch metabolism in tuber crops have been found (Van Harsselaar et al., 2017). Several recent studies focused on the proteomic profiling and functional characterization of several genes associated with starch metabolism in cassava (Chen et al., 2015;Wang et al., 2016). Among them, UDP-glucose pyrophosphorylase (UGPase) is a core enzyme that was clearly determined to have an important role in starch regulation in the tuber (Van Harsselaar et al., 2017). Unfortunately, the role of UGPase in cassava is still poorly understood. Therefore, this study was conducted to investigate the genome-wide identification and structural analysis of genes encoding UGPase, a key enzyme in starch synthesis in cassava, and to analyze the expression of these genes based on the publicly available RNA-Seq data.

Identification of UGPase in cassava
At3G03250, UGPase1 in Arabidopsis thaliana, collected from a previous study (Van Harsselaar et al., 2017), was used as the seed sequence to conduct a BlastP search in the current proteome of cassava (Bredeson et al., 2016) in Phytozome 12 (Goodstein et al., 2012). All identified proteins, with E-values <1×10 -6 , were then confirmed by the presence of the UGPase domain in the Pfam database (Finn et al., 2016). The protein sequences were collected for further analysis.

Annotation of the MeUGP genes in cassava
The general annotation information of the of MeUGP genes, including GeneID, locus name, and TranscriptID, were collected from the cassava genome assembly in NCBI (Bredeson et al., 2016). The genomic DNA sequence and the coding DNA sequence of each MeUGP gene were downloaded and used in further analyses.

Chromosomal distribution of the MeUGP genes in the cassava genome
The location of each MeUGP gene was retrieved from the cassava genome (Bredeson et al., 2016) in Phytozome (Goodstein et al., 2012). The physical size of each cassava chromosome was determined based on the current cassava genome assembly (BioProject: PRJNA234389) in NCBI (Bredeson et al., 2016). The distributions of the MeUGP genes were drawn using Adobe Illustrator.

Structural analysis of the MeUGP genes in cassava
The genomic DNA sequence, CDS, and GC content of the MeUGP gene family were analyzed using BioEDIT software (Hall, 1999). The exon/intron organizations of the MeUGP genes were found in GSDS 2.0 (http://gsds.cbi.pku.edu.cn/) (Hu et al., 2015).

Phylogenetic analysis of UGPase in cassava
Full-length protein sequences of the UGPases were used to construct an unrooted phylogenetic tree using the neighbor-joining method in MEGA 7.0 (Kumar et al., 2016). The resulting tree was then drawn in Adobe Illustrator.

Expression profiles of the MeUGP genes in cassava under normal conditions
The expression profiles of the MeUGP gene family in various organs/tissues under normal conditions were analyzed based on previous RNA-seq data (Wilson et al., 2017). In this study, five tissues, namely fibrous root, root apical meristem (RAM), shoot apical meristem (SAM), friable embryogenic callus (FEC), and organized embryogenic structure (OES) (Wilson et al., 2017), were studied. The criteria of detection followed Wilson et al. (2017) in that FPKM values of 1 were indicated to represent "below the limit of detection", whereas FPKM values of 10 corresponded to "expressed". An expression value of ≥100 FPKM corresponded to "highly expressed".

Identification, annotation, and chromosomal distribution of genes encoding UGPase in cassava
To provide initial information about the genes encoding UGPase in cassava, At3G0325 (AtUGPase1) was used for a BlastP search against the current proteome of cassava (Bredeson et al., 2016) in Phytozome (Goodstein et al., 2012) and annotated in the genome assembly of cassava in NCBI (Bredeson et al., 2016). As a result, a total of 11 genes encoding UGPase (MeUGP) were found in the cassava genome (Table 1).
Next, the distribution of the MeUGP genes was identified in the current cassava assembly. As a result, out of the 11 members of the MeUGP family, 10 genes were mapped onto 6 chromosomes of the cassava genome with different rates of distribution. Among them, chromosomes 1 and 2 each had three MeUGP genes while chromosomes 13, 15, 16, and 18 each had one MeUGP gene. Interestingly, 2 genes, MeUGP4 and MeUGP9, were located on the subtelomeric regions of cassava chromosomes 2 and 16, respectively (Figure 1). Previously, the regions near centromeres (pericentromere) and near telomeres (subtelomere) were suggested to be more permissive to the expansion of segmental duplications (Emanuel & Shaikh, 2001). Thus, we also predicted that these genes may have played important roles in various biological processes during the evolution of the cassava plant.
Only one gene, MeUGP11 (Manes.S044400.1), was not found in the cassava genome (Figure 1). This result could be explained by the fact that this newest cassava assembly is ~582.25Mb set on 18 chromosomes, while approximately 2001 scaffolds have not yet been mapped onto the chromosomes (Bredeson et al., 2016). Previously, the expected cassava genome size was estimated to be approximately 772Mb (Awoleye et al., 1994). Thus, we believe that MeUGP11 could be mapped on the cassava genome assembly in the future.
For further structural analysis, the exon/intron organization of the MeUGP gene family was also retrieved based on the Gene Structure Display Server (GSDS) tool (Hu et al., 2015). As shown in Figure 2, MeUGP genes classified in the same clade often shared the same structure. For example, MeUGP1 and MeUGP4 contained 13 exons/12 introns, while 'MeUGP2 and MeUGP5', 'MeUGP8 and MeUGP6', and 'MeUGP11 and MeUGP9' seemed to share the same gene organization, although their genomic DNA sequences were different. These results showed that the structure of the genes encoding UGP in cassava was quite complicated, and the separation of exons in the gene family during the pressure of natural selection as previously described (Gorlova et al., 2014).    As shown in Figure 2, the majority of the MeUGP genes were expressed in at least 1 tissue. Among them, MeUGP4 was strongly expressed in the SAM, while MeUGP10 seemed to be specific in the RAM. Two genes, MeUGP1 and MeUGP2, were not expressed in any tissues. Previously, the expression profiles of several genes encoding sucrose transporters (SWEET) in 11 tissues of the cassava plant under normal conditions were analyzed. MeSWEET7 was found to be expressed in the FEC and OES, while MeSWEET18 was specific in the RAM. Additionally, MeSWEET26 and MeSWEET27 were also expressed in the RAM and SAM (Ha et al., 2017). Taken together, these results indicated that MeUGP4 and MeUGP10 may play a critical role in starch metabolism in the apical meristem, and thus, be involved in the growth and development of cassava plants.

Conclusions
Eleven members were identified in the UGPase gene family in the cassava genome. Ten of the genes were found to be located on six of cassava's eighteen chromosomes. MeUGP4 and MeUGP9 were mapped on the subtelomeric regions of chromosomes 2 and 16, respectively.
The size of the genomic DNA sequences of the MeUGP genes varied from 3,200 to 11,601bp. The CDS length of the MeUGP genes ranged from 831 to 3,654bp. Additionally, the MeUGP genes contained complicated exon/intron organizations.
Based on previous RNA-seq data, most of the MeUGP genes were found to be expressed in at least 1 tissue. MeUGP4 was highly expressed in the SAM, while MeUGP10 was more specific in the RAM. Two genes, MeUGP1 and MeUGP2, were not expressed in any tissues.