An improved genetic system for detection and analysis of protein nuclear import signals

Background Nuclear import of proteins is typically mediated by their physical interaction with soluble cytosolic receptor proteins via a nuclear localization signal (NLS). A simple genetic assay to detect active NLSs based on their function in the yeast Saccharomyces cerevisiae has been previously described. In that system, a chimera consisting of a modified bacterial LexA DNA binding domain and the transcriptional activation domain of the yeast Gal4 protein is fused to a candidate NLS. A functional NLS will redirect the chimeric fusion to the yeast cell nucleus and activate transcription of a reporter gene. Results We have reengineered this nuclear import system to expand its utility and tested it using known NLS sequences from adenovirus E1A. Firstly, the vector has been reconstructed to reduce the level of chimera expression. Secondly, an irrelevant "stuffer" sequence from the E. coli maltose binding protein was used to increase the size of the chimera above the passive diffusion limit of the nuclear pore complex. The improved vector also contains an expanded multiple cloning site and a hemagglutinin epitope tag to allow confirmation of expression. Conclusion The alterations in expression level and composition of the fusions used in this nuclear import system greatly reduce background activity in β-galactosidase assays, improving sensitivity and allowing more quantitative analysis of NLS bearing sequences.


Background
The proper function of many proteins requires that they are targeted to their correct sub-cellular compartments via specific localization signals. Nuclear import of many cellular and viral proteins is typically mediated by their physical interaction with soluble cytosolic receptor proteins via nuclear localization signals (NLS) [1]. A canonical NLS is characterized by a short single stretch of basic amino acids, as exemplified by the NLS sequence of the large T antigen of the simian virus 40 (PKKKRKV) [2]. These monopartite signals generally contain at least three basic amino acids (B) with a consensus sequence fitting B 4 , P(B 3 X), PXX(B 3 X), B 3 (H/P), where P is proline, H is histidine, X is any amino acid and letters in parentheses can be in any order [1]. Alternatively, bipartite NLSs contain two short stretches of basic amino acids separated by a non-conserved sequence, as found in the cellular nucleoplasmin protein (KRPAATKKAGQAKKKK) [3].
Proteins containing NLSs are imported into the nucleus by interacting in the cytosol with members of the importin α family of NLS receptors (also known as karyopherin α). Subsequent heterodimerization of importin α with importin β (also known as karyopherin β) and interaction with components of the nuclear pore complex (NPC) leads to translocation through the nuclear envelope in a GTP dependent fashion. Inside the nucleus, the importin α-substrate complex dissociates from importin β and the substrate subsequently dissociates from importin α [4]. The yeast Saccharomyces cerevisiae express a single importin α, Srp1 [5], whereas multiple isoforms have been isolated in higher eukaryotes. Each isoform of importin α has distinct substrate specificities, suggesting that this may contribute to the regulation of nuclear import [1]. Despite the fact that many thousands of cellular proteins are transported into the nucleus, very few NLSs have been characterized in detail, and existing work has been biased towards classical monopartite or bipartite signals. As nuclear transport regulates, at least in part, an extraordinarily diverse range of cellular processes including cell cycle, signal transduction, apoptosis and circadian rhythm, the identification of mechanisms regulating nuclear import is a valuable area of investigation.
The largest protein encoded by the human adenovirus type 5 (Ad5) early region 1A (E1A) gene contains 289 amino acid residues (289R). A highly related 243 amino acid (243R) protein is produced by alternative splicing [6]. E1A is present in roughly equal amounts in the nucleus and cytoplasm and both the 289R and 243R proteins contain a well characterized monopartite NLS (KRPRP) located at their C-terminus [7]. This NLS mediates nuclear import in vitro and in vivo and shows a distinct preference for human importin α3 in vitro [8]. A second non-conventional NLS with the consensus sequence FV(X) 7-20 MXSLXYM(X) 4 MF, spans residues 142-182 and is unique to the larger 289 residue E1A protein [9,10]. This sequence does not resemble other known NLSs and appears to be regulated developmentally [11]. At least one other non-canonical NLS is present elsewhere within the E1A protein, as residues 23-120 of E1A have been reported to be sufficient to mediate nuclear accumulation via an unknown pathway in microinjected Xenopus laevis oocytes [12]. Another study, which used indirect immunofluorescence to examine the subcellular localization of 243R E1A in virally infected baby rat kidney cells also demonstrated that the N-terminal region spanning residues 30-85 were involved in nuclear localization. In addition, this latter study confirmed that the well characterized C-terminal NLS was necessary for efficient nuclear localization and further suggested a role for a region encompassing residues 186-220 in infected cells [13]. It is not currently known why E1A contains so many different NLSs. However, it is likely that the actions of E1A, like those of some cellular proteins [14], are closely regulated through modulation of the level of nuclear import. As the E1A proteins possess such an array of nonconventional nuclear import functions, they provide an excellent model system to identify and characterize novel mechanisms that contribute to nuclear localization.
The lack of clearly defined and consistent NLS consensus motifs makes it difficult to predict their presence in a protein of interest. Until recently, the only practical way to identify a functional NLS was by microinjecting or otherwise expressing the test protein in eukaryotic cells, forming heterokaryons or using in vitro transport systems. As an alternative, an easier and more sensitive method that exploited yeast genetics was devised to detect an active NLS, which is based on the expression of the test protein fused to a modified LexA DNA binding domain (DBD) and the Gal4 transcriptional activation domain (AD) in the yeast S. cerevisiae [15]. This transcription based assay relies on the ability of a functional NLS to allow the chimera to enter the yeast nucleus and activate transcription of a LexA responsive β-galactosidase or HIS3 reporter gene. In the absence of a functional NLS, the fusion protein is not efficiently imported into the nucleus, and is unable to activate transcription. As a result, this assay provides a simple qualitative measure of NLS function based on β-galactosidase activity assays or by monitoring yeast growth on medium lacking histidine respectively. This system is not limited to the identification of yeast NLSs, as the nuclear import apparatus is highly conserved between yeast and higher eukaryotic cells [1,15].
We have obtained this system and verified that Ad5 E1A actively functions to promote nuclear import under these conditions. However, the original system had two major problems that have now been corrected. A high level of background activity was initially observed, which was in part related to the high levels of test protein expression from the original multicopy pNIA vector. To solve this issue, we reconstructed the expression vector as a single copy yeast plasmid. The small size of the initial pNIA LexA fusion allows the chimera to passively diffuse into the nucleus, increasing the background activity. We have reengineered the vector with an additional "stuffer" sequence that encodes a portion of the E. coli maltose binding protein that contains no NLS activity. This increases the size of the chimera well above the commonly accepted 50 kDa passive diffusion limit of the nuclear pore complex [1] and greatly reduces background activity in the β-galactosidase assays. The improved vector also contains an expanded multiple cloning site that sim-plifies cloning of target sequences and a hemagglutinin (HA) epitope tag to allow confirmation of expression. Using this improved system, we demonstrate that it faithfully detects all the known E1A NLS functions in a quantitative fashion. Thus, this improved system may aid in the identification and characterization of novel NLS activities.

Construction and evaluation of an improved vector for detecting nuclear import
A simple transcription based assay in yeast for detecting nuclear import has been described [15]. Using this system, we evaluated nuclear import conferred by fusing the full length Ad5 289R E1A protein or the N-terminal 82 amino acids  in frame to the modified LexA DBD-Gal4 AD chimera (Fig. 2). Although a substantial increase in β-galactosidase activity was detected with E1A 1-82 as compared to the empty pNIA vector, surprisingly little change was observed with the full length 289R E1A protein or the well studied SV40 large T antigen NLS. Based on these results, we pursued a series of modifications to improve the quantitative functionality of this system and its general utility.
As described in the Materials and Methods and illustrated in Fig. 1, we first reconstructed the pNIA LexA DBD-Gal4 AD into another multicopy plasmid backbone originally derived from YEplac181 [16]. This yielded pNIA-2µ, which has a known nucleotide sequence, contains an improved polylinker and has an HA epitope tag to allow convenient detection of expression of the LexA fusion. However, this vector contains the entire ADH1 promoter, which is substantially more active than the truncated ADH1 promoter in the original pNIA. Tests with the same candidate NLSs in pNIA-2µ showed a uniformly high level of activity for all constructs, including the empty vector (Fig. 2). This was likely caused by the very high level of expression of the LexA DBD fusions from the strong promoter in a multicopy plasmid. Indeed, Western blot analysis of the expression levels of the various fusions using an anti-HA antibody confirmed abundant expression, with the exception of E1A 1-82 (Fig. 3). Recalculation of the nuclear import based on relative levels of protein expression suggested that E1A 1-82 continues to display import activity above that of the corresponding empty vector (Fig.  2B).
To reduce the high level of protein expression from pNIA-2µ, we shifted the entire expression cassette into the single copy YCplac111 vector [16] and tested the same candidate NLSs in pNIA-CEN. Western blot analysis with the HA tag confirmed that expression levels of the fusions were lower in yeast transformed with the single copy plasmid than the multicopy plasmid (Fig. 3). Again, E1A 1-82 was con-sistently expressed at a lower level than the other constructs. The background activity of the empty pNIA-CEN vector was dramatically lower than that of pNIA-2µ ( Fig.  2A). Importantly, all candidate NLSs conferred a strong increase in β-galactosidase activity in pNIA-CEN, regardless of whether they are normalized to protein expression (Fig. 2B). These results clearly show that overexpression of the LexA DBD fusions by the multicopy plasmid obscures nuclear import.
It is commonly accepted that the nuclear pore complex has a passive diffusion limit of approximately 50 kDa [1]. The LexA DBD-Gal4 AD-HA fusion produced by the empty pNIA-2µ and pNIA-CEN vectors is 363 amino acids in size with a predicted molecular weight of about 40 kDa (Fig. 3). Thus, passive diffusion of the test fusion into the nucleus likely occurs and this could contribute to the relatively high background activity in the β-galactosidase assays. To overcome this issue, we inserted a "stuffer" fragment corresponding to a portion of the E. coli maltose binding protein (MBP) in frame between the LexA DBD and Gal4 AD to create pNIA-CEN-MBP. Analysis of this fragment of MBP with PSORT did not detect any sequences predicted to function as an NLS [4], and this was confirmed experimentally (Fig. 2). Insertion of this fragment increases the molecular weight of the test fusion to about 80 kDa (Fig. 3), well above the passive diffusion limit. All candidate NLSs confer a strong increase in βgalactosidase activity in pNIA-CEN-MBP. Importantly, the background activity of the empty vector decreased to nearly undetectable levels, yielding very large fold differences between the empty vector and vectors containing control NLSs (Fig. 2B). When compared with pNIA-CEN, these results suggest that passive diffusion of the smaller LexA DBD fusions obscures active nuclear import conferred by the candidate NLSs. Thus, the pNIA-CEN-MBP vector appears ideally suited for the analysis of short sequences that may function as NLSs. This is exemplified by the short SV40 large T antigen NLS, which showed the most dramatic increase in import activity between pNIA-CEN and pNIA-CEN-MBP (Fig. 2).

Analysis of nuclear import conferred by various portions of E1A
As described in the introduction, E1A contains a canonical NLS with the sequence KRPRP spanning residues 285-289. In addition, a second non-conventional NLS with the consensus sequence FV(X) 7-20 MXSLXYM(X) 4 MF, spanning residues 142-182 has been described and a third activity is localized within residues 30-85. PSORT analysis [4] of E1A also predicts the presence of a second canonical NLS with the sequence RRPK spanning residues 205-208, which is located within a region previously identified as influencing nuclear localization of E1A in infected baby rat kidney cells [13]. Although the modified pNIA can Schematic of the construction of pNIA derivatives used in this study detect the SV40 large T antigen NLS, and at least the N-terminal NLS in E1A (Fig. 2), it was unclear whether this yeast system can faithfully detect other NLSs that function in mammalian cells. To address this issue, we tested a number of fragments of E1A in pNIA-CEN-MBP to see if the various regions previously shown to confer nuclear import in a variety of other test systems were detected using this yeast system (Fig. 4). As previously shown in Fig. 2, full length 289R E1A or E1A 1-82 function to direct nuclear localization of the LexA chimera. In addition, E1A 139-204, which contains a developmentally regulated non-canonical NLS, also functions in the yeast system. Similarly, E1A 187-289, which encompasses the well characterized canonical NLS, as well as the PSORT predicted NLS, confers nuclear import. Deletion of this well characterized NLS, to generate E1A 187-281, greatly reduces import, suggesting that the major activity resides in the well characterized canonical NLS. In agreement, fusion of just this canonical NLS, contained within E1A 282-289, is sufficient to induce strong import of the chimera. Fusion of E1A 201-218, which contains a canonical NLS predicted by PSORT, exhibited the lowest level of import activity. As such, this signal may not function efficiently, although it is located within a region of E1A known to influence nuclear localization in infected cells [13]. Thus, the yeast nuclear import system appears quite versatile and is capable of recognizing all the known functional NLSs within E1A, whether they represent canonical or non-canonical sequences. Furthermore, this system may be sensitive enough to uncover weak import sequences. However, one obvious caveat of this technology, or any other approach that uses small fragments of proteins for identifying sequences directing nuclear import, is that these activities may not function in the context of the full length folded protein. In many cases, the effect of mutating the putative NLS on nuclear localization of the full length protein could be tested, although this is not feasible in proteins such as E1A that contain multiple nuclear localization signals.

Conclusion
The availability of a simple and quantitative transcription based method for the identification and functional analysis of nuclear import in yeast provides a powerful alternative method for studying the complexities of nucleocytoplasmic transport. Although useful for analysis of individual proteins, the availability of this technology may allow high throughput analysis of nuclear import potential and could be readily applied to large scale screens of short random sequence to identify novel potential NLSs.
Western blot analysis of the expression of the various chime-ras