Label-free serum proteomics and multivariate data analysis identifies biomarkers and expression trends that differentiate Intraductal papillary mucinous neoplasia from pancreatic adenocarcinoma and healthy controls

Intraductal Papillary Mucinous Neoplasia (IPMN) are potentially malignant cystic tumors of the pancreas. IPMN can progress from low to moderate to high grade dysplasia and further to IPMN associated carcinoma. Often the difference between benign and malignant nature of the IPMN is not clear preoperatively. We aim to elucidate molecular expression patterns of various grades of IPMN and pancreatic carcinoma. Additionally we suggest potential novel biomarkers to differentiate IPMN from healthy individuals and pancreatic carcinoma to enable early detection as well as help in differential diagnosis in future. We have performed retrospective label-free proteomic analysis of the serum samples from 44 patients with various grades of benign IPMN or IPMN associated carcinoma and 11 healthy controls. Proteomic data was further analyzed by various multivariate statistical methods. Four groups of samples (low-grade, high-grade IPMN, pancreatic carcinoma and age- and sex-matched healthy controls) were compared with ANOVA. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) modeling gave S-plot for feature selection. Stringently selected potential markers were further evaluated with ROC curve analysis and area under the curve was calculated. Differentially expressed proteins were used for pathway analysis. Linear trend analysis (Mann Kendall test) was used for identifying significant increasing or decreasing trends from healthy-low grade-high grade IPMN-pancreatic carcinoma. Based on protein expression (436 proteins quantified), PCA separated most sample groups from each other. S-Plot selected biomarker panels with moderate to very high AUC values for differentiating controls from Low-, High-Grade IPMN and carcinoma. Linear trend analysis identified 12 proteins which were consistently increasing or decreasing trend among the groups. We found potential biomarkers to differentiate healthy controls from different degrees of dysplasia and pancreatic carcinoma. These biomarkers can classify IPMN, carcinoma and healthy controls from each other which is an unmet clinical need. Data are available via ProteomeXchange with identifier PXD009139. Kininogen-1 was able to differentiate healthy persons from low and high-grade IPMN. Retinol binding protein-4 could classify the low-grade IPMN from pancreatic carcinoma. Twelve proteins including apolipoproteins and complement proteins had significantly increasing or decreasing trends from healthy to low to high-grade IPMN to pancreatic carcinoma.


Background
Intraductal Papillary Mucinous Neoplasia (IPMN) tumors are considered possibly precancerous and starting from low grade dysplasia some of them develop into IPMN associated pancreatic cancer. The prognosis of IPMN is good if the tumor is operated before malignant transformation occurs. On the other hand, not nearly all IPMNs undergo malignant transformation during the patient's life time. The criteria for surgery are listed in the European and International consensus guidelines [1,2]. The risk of cancer is higher in IPMNs involving the main duct. Progression to cancer may occur years after diagnosis and therefore patients are followed up as long as they are fit for major surgery. Current way of life-long follow up is laborious for both patients and hospitals, and total costs are considerable.
New biomarkers are needed to help identify early IPMN cases and define the grades of dysplasia based on minimally invasive techniques. At the same time it's important to differentiate the low-risk patients from those with high grade dysplasia with a high risk of developing carcinoma and from patients already carrying IPMN associated carcinoma. Less invasive markers that stratify the risk population will immensely help the clinical decision making. Pancreatic biopsy is an invasive procedure and diagnostic samples can be difficult to obtain. The same is true for fluid samples from pancreatic cysts. Therefore, biomarkers determined from blood samples are an important research field because they can be easily used for long follow ups.
Differences in proteins found in serum samples could provide a way to differentiate IPMNs with different states of dysplasia. The aim of our study was to compare the serum protein profiles in sera of patients with IPMN with low (LG) or high grade (HG) dysplasia, IPMN associated carcinoma (IPMNC) and healthy controls (CRTL). We have quantified 436 serum proteins with two or more unique peptides from 55 serum samples including 11 healthy controls. The proteomic dataset was further analyzed with advanced multivariate statistical data analysis techniques to find the protein features which could be used as potential biomarkers of various grades of dysplasia as well as to differentiate dysplasia samples from pancreatic carcinoma. The multivariate data analysis techniques employed in the current study included principal component analysis (PCA), orthogonal projections to latent structures-discriminant analysis (OPLS-DA) which were followed by pathway analysis, linear trend analysis (Mann Kandell test) and Receiver operating characteristic curve analysis. These statistical techniques gave us a biomarker panel to differentiate various types of IPMN form each other and from pancreatic carcinoma and healthy controls. Additionally, linear trend analysis identified proteins having a clear increasing or decreasing expression trend from healthy controls -low grade -high grade IPMN -pancreatic carcinoma. These proteins potentially have a mechanistic role in progression and/or development of the neoplasm.

Materials and methods
At Helsinki University Hospital (HUH) 98 patients were operated for pancreatic intraductal papillary mucinous neoplasia (IPMN) in 2000-2015. Preoperative frozen serum sample was available for 44 patients out of the 98. Clinical data was collected and re-evaluated. Routine surgical specimens from the archives of Department of Pathology were re-evaluated by an experienced pathologist. In our patient series 13 patients had low grade dysplasia, 10 patients high grade dysplasia and 21 carcinoma. Fourteen tumors were main duct type, 25 branch duct type and 5 mixed type IPMN. The majority of the tumors were pancreatobiliary (31 samples), eight were intestinal and three were gastric subtype. We did not receive histological samples of two cases from the archives and thus we were not able to re-evaluate the subtype of the tumor of these two cases.
The 21 cancer samples were mainly classified as ductal adenocarcinoma (12 samples), 7 samples were IPMN associated invasive carcinoma and 2 colloid carcinoma. The serum samples were stored in − 80°C until analyzed. Control serum samples were collected from 11 age and gender matching healthy individuals. This is a multigroup comparison retrospective study. ANOVA and advanced multivariate statistical techniques were used for data analysis. Receiver operating characteristic curve analysis was also used to evaluate performance of the potential markers.
The study was approved by the Surgical Ethics Committee of Helsinki University Hospital (Dnro HUS 226/ E6/06, extension TMK02 §66 17.4.2013), the use of archive tissue material by the National Supervisory Authority of Welfare and Health (Valvira Dnro 10,041/06.01.03.01/ 2012), and collecting serum samples by from the patients (written informed consent).

Trypsin digestion
Serum samples were thawed and immediately processed essentially as described previously in details [3][4][5]. Briefly, top12 high-abundant proteins were depleted using Pierce™ Top 12 Abundant Protein Depletion Spin Columns (Thermo Fisher). Depleted serum samples were used for total protein determination using BCA essay (Pierce, Thermo Fisher). Equal amount of protein was reduced, alkylated and trypsin digested before mass spectrometry analysis. Further details are given in Additional file 1: Supplementary methods.
Liquid chromatography-mass spectrometry (LC-MS) and quantitation Ultra-performance liquid chromatography (UPLC) and ultradefinition MS E (UDMSE) Four μL of each samples (1.4 μg of peptides) were injected to nanoAcquity UPLC system (Waters Corporation, MA, USA). Separation device used prior to MS was TRIZAIC nanoTile 85 μm × 100 mm HSS-T3u wTRAP. Buffer used, analytical gradient and data acquisition parameters are given in Additional file 1: Supplementary methods. Workings of UDMS E has been described briefly in Additional file 1: Supplementary methods and in details previously [6].

Data analysis
Data analysis was essentially performed as described previously [3][4][5]. Briefly, raw files were imported to Progenesis QI for Proteomics software with lock mass correction using doubly charged Glu1-Fibrinopeptide B (785.8426 m/z). Runs were aligned automatically and peak picking was performed, both with default parameters. Protein Lynx Global Server was used for peptide identification and label free quantitation was performed according to Silva et al. [7]. Further details are given in Additional file 1: Supplementary methods.

Statistics and pathway analysis
PCA, OPLS-DA modelling and ROC curve details are given in Additional file 1: supplementary methods and figures. Pathway analysis was performed with FunRich 3.0 [8]. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [9] partner repository with the dataset identifier PXD009139.

Metadata
Forty-seven patients, who were operated for IPMN at Helsinki University Hospital in 2000-2015 were included in the study. They included 15 patients with low grade dysplasia (LG), 10 patients with high grade dysplasia (HG) and 22 with IPMN associated carcinoma (IPMNC). All of them had given a serum sample before operation with informed written consent. We also collected serum samples from 11 healthy volunteers to be used as control samples. One sample failed normalization and two others had breast cancer and mucinous carcinoma in addition to IPMN. These 3 samples were removed from the further analysis and only 44 samples were further analyzed. All samples included in the study and analyses are given in Additional file 2: Table S1A.

Ultra high definition MS E (UDMS E ) and statistical analysis
Fifty-five serum samples including 11 controls were analysed by UDMSE mode (Data-Independent acquisition) as described in methods. Nine hundred proteins were identified however it also included proteins with one or less unique peptides. We filtered the data with 12 ppm mass error, + 1 to + 4 charge and at least 2 unique peptides which gave us 436 reliably identified and quantified proteins. These 436 proteins are reported in this study and only they were used for further analyses (Additional file 2: Table S1B). These 436 proteins contained 11,711 total peptides out of which 7306 were unique peptides to their corresponding proteins. The confidence score was 6 for putative uncharacterized protein FLJ11871 and 4172 for complement C3. It is to be noted that within class and across class variation for all the proteins was relatively lower (controls median %CV for all the proteins: 22.20, LG: 26.91, HG: 28.21 and IPMNC: 38.52). The median of the median for all classes of samples was 27.56. This suggests that our phenotypes were homogenous and suitable for inter-class comparisons. At the same time low %CV, across all classes, makes it harder to dig out the proteins which differ biologically across the classes. We made 6 comparisons across classes and used stringent multivariate classification techniques to find the protein expressions which could classify any 2 different classes in pairwise comparisons. For every comparison, principal component analysis and orthogonal projections to latent structures-discriminant analysis (OPLS-DA, visualized by S-Plot) was performed. OPLS-DA is a stringent multivariate data-analysis technique which can be used to find predictive variance and separate out uncorrelated variance from the data. It is to be noted that we report all proteins passing p(Corr) [1] cutoff value of + 0.7 or − 0.7 however, in the main text we only report proteins among these which also have p [1] cutoff of + 0.1 or − 0.1. These proteins have less chances of having spurious correlations to the phenotypes. Further details about these cutoffs are provided in discussion section. For the purpose of ease of understanding, the section will be divided into six subsections encompassing comparisons between all types of samples used in the study. The ANOVA p values for these proteins for each individual comparison as well as fold changes and other parameters can be found in Additional file 2: Table S1C.

Control vs low grade dysplasia
In this comparison, which is mainly done to identify the very early detection biomarkers (also working as screening biomarkers), 108 proteins passed the ANOVA p value cutoff (≤0.05). Out of these, 34 proteins were decreased in LG samples while 74 proteins were increased (Additional file 2: Table S1C). Among the decreased proteins, 15 proteins had the fold change (FC) of more than 1.5 including four proteins having more than FC of 2. Among these were Cytochrome c-type heme lyase (mean normalized abundance in LG 588.43 ± 94.82 and 1202.81 ± 162.42 in controls) and Serine protease inhibitor Kazal-type 2 (LG: 13463.42 ± 1535.74 and controls: 27433.22 ± 3376.14). Similarly among the increased proteins in LG, 21 proteins had FC more than 1.5 including 13 with FC of more than 2. Few key proteins which had higher levels in LG are lipoprotein lipase (mean normalized abundance in LG: 15660.04 ± 2962.59 compared to 3864.24 ± 492.72 in controls) and Alanyl-tRNA editing protein Aarsd1 (LG: 58179.62 ± 7025.68 compared to 16,523.85 ± 1326.76 in controls).
The separation of these two classes of samples was visualized with principal component analysis ( Fig. 1a, b). Figure 1a shows the PCA when all proteins quantified were considered for PCA. This panel already shows the clear separation of these classes of samples. Figure 1b shows PCA when proteins passing the cutoff of ANOVA p Values less than 0.05 were considered. This panel shows complete separation of 2 classes.
Encouraged by the PCA results, we modelled the quantitative proteomics data with Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) which is visualized by S-Plot. OPLS-DA can classify the samples into their respective classes if the data has predictive variance. It also provides the protein Ids which are responsible for the classification. In other Fig. 1 A representative Principal Component Analysis (PCA) and Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) visualized by S-PlotPCA of Control vs Low-grade dysplasia when all the proteins were considered for PCA (a) and when only ANOVA passing proteins (p value < 0.05) were considered for PCA (b). Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) visualized by S-Plot (c). 0 to − 1 space contains proteins higher in controls and 0 to + 1 space contains proteins higher in low-grade dysplasia. X-axis is p [1] loadings which tells about the magnitude of variance and Y-axis is p(Corr) [1] which tells about the reliability of the predictive variance. A cutoff of > + 0.7 or < − 0.7 for p(Corr) [1] was used to find significantly different proteins between the groups words, proteins which are strongly representative of the sample classes can be found using this technique. S-Plot was used for pairwise comparison (Controls Vs LG) which is shown in Fig. 1c. S-Plot gave 8 proteins (p(Corr) [1] of > + 0.7 or < − 0.7) which were able to classify the two classes. These proteins were increased in LG as shown in Additional file 2: Table S2. Figure 1 is a representative figure shown in main body of the manuscript. For all other comparisons similar figures are shown in the Additional file 1.

Control vs high grade dysplasia
Seventy proteins passed the ANOVA significance of 0.05 among which many were found to be increased or decreased (FC 1.1 to 13.4, Additional file 2:  Figure S1A) and complete separation when ANOVA cutoff (0.05) passing proteins were considered (Additional file 1: Figure S1B). OPLS-DA S-Plot (Additional file 1: Figure  S1C) provided with four proteins passing the cutoff of p(Corr) [1] of + 0.7 and − 0.7 which are given in Additional file 2: Table S3.

Control vs IPMN carcinoma
Control vs carcinoma comparison provided 124 proteins which were passing the ANOVA cutoff, out of which, 50 were present in increased amounts in IPMN carcinoma patients and the rest were decreased in carcinoma (Additional file 2: Table S1C). PCA showed clear separation when all the proteins were considered (Additional file 1: Figure S2A) which became even better with ANOVA significant proteins (Additional file 1: Figure  S2B). Controls were mostly clustered together in PCA space while IPMN associated carcinoma (IPMNC) were largely heterogenous in nature as they were spaced further apart from each other in PCA space. OPLS-DA S-Plot (Additional file 1: Figure S2C) provided an exhaustive list of 27 proteins passing the cutoff of p(corr) [1] of + 0.7 and − 0.7 (Additional file 2: Table  S4). This highlights the clear serum protein profile differences between CTRL and IPMNC patients.

Low grade vs high grade dysplasia
This comparison was made to ascertain whether transformation from low grade to high grade dysplasia is also reflected in the serum protein profile.
LG and HG samples were mixed in PCA analysis when all the proteins were considered and showed slightly better separation when only ANOVA significant proteins were considered (Data not shown). However still, some samples showed overlap between LG and HG in PCA space which is expected as this classification (pathologically establishing LG and HG) is done on a subjective basis and serum protein profiles may not be that different as the disease is not fully developed at this stage. OPLS-DA S-Plot did not give any proteins passing the cutoff of p(Corr) [1] specified in previous analyses.

Low grade vs IPMN carcinoma
Among the proteins increased in LG compared to IPMNC four proteins had a fold change > 2 which included Rhodopsin kinase (LG: 211. 36 Figure S3A) however, when ANOVA significant proteins were considered (Additional file 1: Figure S3B) several of them moved apart in the space. This shows that it is possible to classify LG Vs IPMNC patients but some of them might still overlap. The observation that LG can be classified apart from IPMNC was further supported by OPLS-DA S-Plot (Additional file 1: Figure S3C) which gave several proteins with good p(Corr) values (Additional file 2: Table S5). These proteins can be used to potentially discriminate between these partially ambiguous clinical entities.

High grade dysplasia vs IPMN carcinoma
Some significantly different proteins between HG and IPMNC which had higher levels in IPMNC samples (FC > 3) were same as LG vs IPMNC (Sp17 and MYL6).
Other key proteins which had higher levels in HG compared to IPMNC were Sperm protein associated with the nucleus on the X chromosome N3 (SPANXN3, HG: 613.57 ± 137.57, IPMNC: 214.13 ± 42.30) and Cathelicidin antimicrobial peptide (CAMP, HG: 676.57 ± 114.04, IPMNC: 329.65 ± 55.14). HG vs IPMNC PCA showed a similar pattern as LG VS IPMNC again emphasizing that it's possible to differentiate between dysplasia and IPMNC based on serum protein profile (Additional file 1: Figure S4 A & B). OPLS-DA S-Plot (Additional file 1: Figure S4C) once again gave several proteins (Additional file 2: Table S6) capable to discriminate between these 2 clinically inseparable entities (with minimally invasive serum proteomic profile).

Network/pathway analysis
Various proteins lists originating from above mentioned comparisons were enriched for biological pathways using the publicly available tool FunRich 3.0 [8]. Initially to get a glimpse of what pathways were altered, we considered all the significantly different proteins (ANOVA cutoff < 0.05) for enrichment of biological pathways. Top 10 pathway according to -log10 p value of enrichment are given in Fig. 2. These top 10 pathways encompassed complement cascade, coagulation and lipid transport as the major categories. Note that percentage of genes in each pathway, multiple corrections of p values by two different methods etc. gave different pathway as top ones. The detailed pathway table are given in Additional file 2: Table S7.
In comparing control group with all three categories of IPMN, LG, HG and IPMNC, the top 5 pathway enriched in proteins having highest mean in controls and proteins having highest mean in either LG, HG and CAR separately are given in Table 1. The pathways were sorted according to p values of enrichment. They are similar in controls across the comparisons and complement and coagulation are the main pathways in proteins having highest mean in controls. However, initially in LG, integrin signalling and proteoglycan based signalling come up while as we go towards HG and IPMNC they also shift towards complement and coagulation.

Stage wise comparison of patterns in protein expression (linear trend analysis)
To find out if there are patterns in serum protein expression across various stages such as from controls to LG to HG to IPMNC, we generated line graphs. After taking category-specific averages we got 4 values (Controls, LG, HG and IPMNC) which we plotted to see the angle at which they decrease or increase throughout the four classes of samples. We found several proteins with patterns and they were further filtered to include only the proteins which were passing the ANOVA cutoff of 0.05. Twelve proteins with clear increasing or decreasing trend were found in the dataset. The line graphs of these proteins are presented in Fig. 3.

ROC curve analysis
Biomarkers selected by S-Plot were verified by ROC curve analysis and area under the curve values (AUC) were calculated for each individual candidate. These biomarker panels for all comparisons are listed in Table 2.
Good AUC values were found for all proposed biomarkers ranging from 0.739 (Mannan-binding lectin serine protease 1 to differentiate CTRL vs IPMNC) to 1.000 (KRT72 to differentiate CTRL vs CAR).

Discussion
Benign Intraductal papillary mucinous neoplasias (IPMN) are noninvasive epithelial tumours of the pancreas. These tumours are a common incidental finding at CT and MRI. Diagnosis for the most part depends on radiologic examination despite unsatisfactory accuracy of the tools. More invasive procedures such as histopathological investigation can help the diagnosis. This includes the classification of IPMN into low-grade and high-grade dysplasia. Often, both these types of dysplasia can be seen in the same lesion suggesting progression from low grade to high grade occurs in vivo. A LG dysplasia lesion has 8% chance of turning into IPMN associated carcinoma while a HG dysplasia lesion turns into IPMNC with 25% chance over 10 years [10]. Pancreatic cancer has poor survival rates which have improved only little in recent years even in developed countries. This necessitates very long follow up of the patients until they are no longer fit for major surgery. Long follow ups pose great economic demands on the health care system. We analyzed serum samples by quantitative proteomics from three classes of patients (LG, HG, IPMNC) and compared them to healthy individuals. We made six different comparisons from these 4 categories of samples. To differentiate LG and HG from controls would provide minimally invasive biomarkers for early detection of IPMN while control vs carcinoma comparison can provide screening biomarkers. IPMN is mostly asymptomatic to patients or have non-specific symptoms such as jaundice, pancreatitis and abdominal pain which may already be signs of late stage disease. It is difficult to diagnose IPMN based on only these symptoms. We did further comparisons such LG vs HG, LG vs IPMNC and HG vs IPMNC. These comparisons are particularly important for clinicians as it is not known which of the IPMN (and what grade of dysplasia) will eventually LG can rather safely be followed up, whereas patients with HG often are recommended surgery because of the high risk of developing cancer. Four hundred thirty-six proteins (at least 2 unique peptides) were quantified from the serum samples of above described groups of patients and healthy individuals. After analyzing the serum proteomic profile by PCA and OPLS-DA modeling, we also calculated AUC values of ROC curve for the proteins found to be significantly different in S-Plot.
The first three comparisons that we did among the sample classes were control vs LG or HG or IPMNC. Control vs LG or HG comparisons would be expected to provide early detection biomarkers. Other comparisons such LG vs IPMNC and HG vs IPMNC will give us information as to which of these LG or HG might develop into carcinoma in context of serum protein expression values. Further, LG vs HG vs IPMNC comparison biomarkers will help make better clinical decision about resection of HG vs IPMNC. Top proteins which could differentiate the controls from LG (increased in LG) were Protein WWC2 (WWC2), kininogen-1 (KN1), Insulin-like growth factor-binding protein 3 (IGFBP-3) and guanine nucleotide exchange factor DBS (DBS, decreased in LG) (Additional file 2: Table S2). It is to be noted here that Kininogen-1 and Clusterin had the best p(Corr) [1] and p [1] parameters to be suggested as potential biomarkers (Table 2) however, to get a glimpse of altered events/pathways even relatively low confidence changes are considered. WWC2 negatively regulates cell proliferation by modulating hippo pathway [11]. YAP-1 is among the proteins which control pancreatic cancer initiation [12] and WWC2 is an inhibitor of YAP-1. It's plausible that as YAP-1 activity increases in LG, WWC2 is upregulated to counter its effects. WWC2 protein was consistently increased in LG, HG and IPMNC compared to control and it could also discriminate IPMNC from controls by S-Plot. Tissue specific upregulation of IGFBP3 in PDAC compared to IPMN have been previously found [13]. IGFBP3 has been shown to discriminate the early stage IPMNC from that of healthy controls [14,15] and our results show that already at LG stage of IPMN this protein is discriminatory to healthy controls.
In comparison of controls vs LG, one of the key protein decreased in LG was SPINK2. This protein was also decreased in HG and IPMNC however there was not much difference between LG vs HG. It's family member SPINK1 is a negative regulator of autophagy and mutations in SPINK1 are known to be associated with hereditary pancreatitis [16]. Variable expression of SPINK2, on the other hand, is known to modulate response to apoptotic stimuli in-vitro [17]. Additionally, Kiniongen-1 was found to be the top S-Plot protein in comparisons of controls vs LG and HG and compared to controls it was still higher in IPMNC. This protein is part of kallikrein-kinin system of blood coagulation and thromboembolic disease is one of the major complications of pancreatic adenocarcinomas [18].
Cytochrome c-type heme lyase (HCCS) was the top protein predicted by S-Plot to be discriminatory between HG and controls. Looking at the data for this protein in whole dataset there was a clear trend of decrease from controls to dysplasia to carcinoma. HCCS was lower in both LG and HG compared to controls. HCCS stabilizes free cytochrome c [19], therefore it might function in the evasion of apoptosis by various types of cancer cell. In comparison of HG vs IPMNC, S-Plot gave 6 proteins out of which the highest fold changes were observed for TDP1. TDP1 depletion can lead to cell death and its upregulation in IPMNC is expected as it repairs the DNA damage introduced tumorigenesis events and by various chemotherapy drugs [20]. Hemoglobin alpha and beta were also reduced in HG compared to controls. In as much as 7% of patients with various solid tumors including pancreatic cancer, disseminated intravascular coagulation is observed [21] which can lead to hemolytic anemia potentially reducing hemoglobin levels. Hemolytic anema cases in pancreatic cancers have been reported [22]. On the other hand Claspin had higher levels in HG compared to controls and it was also high in LG and IPMNC. It has been shown that Claspin downregulation can sensitize the pancreatic cancer cells to drug-induced DNA damage [23]. In line with this observation, a base excision repair enzyme DNA polymerase epsilon subunit 3 was found to be higher in HG compared to controls in our dataset. This enzyme is involved in recombinatorial processes in the cells [24] and it is expected to be higher in dysplasia and cancer cells due to higher recombinatorial activity commonly found in cancers.
In comparing LG and HG with IPMNC, some top changing proteins were Cancer antigen 22 (CT22) and Rhodopsin kinase which were decreased and increased in LG compared to IPMNC, respectively. CT22 is known as sperm surface protein 17 and belong to the class of cancer/testis antigens which show high promise as oncologic biomarkers. CT22 has bene shown to be a good diagnostic and prognostic biomarker for subsets of epithelial ovarian cancer [25] and several patents about breast and other cancers have been released. It was also less abundant in HG compared to IPMNC. Among the top reduced proteins in HG compared to IPMNC, Cathelidicin antimicrobial peptide was found to be reduced in IPMNC compared to HG. It has been shown in mice that its deficiency can worsen the acute pancreatitis by modulating inflammation of the pancreas [26]. Its deficiency in pancreatic cancer may be the mediator of sustained inflammation driving cancer growth.
To find out the events in serum factor upregulation starting form healthy controls to LG to HG to IPMNC, all proteins were used for drawing line graphs. We could find 12 proteins which had significant ANOVA values and which were showing a pattern in consistent increase or decrease across the progress of IPMN. Eight of these proteins were increasing from control to LG to HG to IPMNC while four others were decreasing in the same order (Mann-Kendall p value < 0.05). Among the consistently increased proteins was C-reactive protein which has previously been shown to discriminate between pancreatic cancer and healthy individuals [27]. However it is a well-known fact that CRP is not specific to pancreatic cancer and numerous other causes such as bacteremia, SIRS, rheumatoid disease and trauma can also lead to it upregulation in serum [28,29]. However, it can be a part of a panel together with other proteins to create a surrogate endpoint for clinical entities. CRP was not statistically significantly different between CTRL, LG and HG (ANOVA p value > 0.05) however it was significantly different between CTRL and IPMNC (ANOVA p value < 0.05). It is possible that CRP can be used together with a panel of proteins as detection and/ or screening biomarker to detect early stage PDAC however this claims warrants further validation. Other consistently increased proteins included four complement proteins (C2, C9, CFH, CFB). Complement proteins have been previously shown to be elevated in serum of pancreatic cancer and acute pancreatitis patients [30]. Complement cascade activation leads to persistent inflammation and activates cancer specific pathways [31,32]. Complement factor B has been suggested to be a supplementary marker to CA 19-9 to diagnose PDAC. Complement system has been shown to be growth-promoting for cancer cells [33] and its inhibitors are being considered for anticancer therapy [34]. Remember that complement activation was the main pathway enriched in comparing controls to LG as well as HG and IPMNC in our dataset (Table 1). APOE and APOC-II were among the proteins, which were consistently decreased, being highest in controls. These two are co-expressed proteins which function in the transport of lipids. APOE also inhibits the aggregation of platelets [35]. Platelet aggregation is utilized by various cancer cells as a strategy to modulate the thrombosis and haemostasis to promote its own growth [36]. Serum factors up or downregulation might also be needed by various cancers to sustain their growth and migration and our study identifies these factors in a patterned manner. The 12 proteins with significantly increasing or decreasing trend from CTRL to IPMNC were not proposed as biomarkers in our study. It was because these proteins, although their means were statistically significantly increasing or decreasing in various groups, had larger variation across the groups. This variation makes their values more spread out across individuals and prohibits their use as biomarkers.
This was a single institution study which might raise questions about generalizability of the findings to Finnish population. Our findings may not be true for all centers in Finland; however there was no active enrollment process involved which is a generally a source of bias. Instead, all patients, for whom the preoperative serum samples were available (about half of all resected), were enrolled in the study. This excludes the selection bias due to other decision making related factors, to a large extent. However, we do acknowledge that unknown and unidentified sources of bias, of varying magnitude, may still be present and may prevent complete generalization of the study to the population level inferences. Nevertheless, our study provides a rich source of biomarker candidates which can be validated in multi-institutional studies on a larger scale in a targeted manner.
We have imposed stringent parameters in OPLS-DA S Plot to reach panels of potential serum biomarkers ( Table 2) to discriminate between CTRL vs LG by 2 proteins, CTRL vs HG by 1 protein, CTRL vs IPMNC by 3 proteins and LG vs IPMNC by 3 proteins. We could not differentiate between LG and HG as well as HG and IPMNC. Low and high grades of IPMN are a very close phenotype and so is High-grade IPMN and carcinoma which makes it harder to differentiate by serum proteomics. However, LG and IPMNC differentiation by serum marker is clinically very useful. Trend analysis provided information about involvement of upregulation of complement pathway in transformation from healthy to LG to HG to carcinoma and downregulation of apolipoproteins. These 12 proteins significant by Mann Kendall trend test need to be further validated to provide more information about progression and development of neoplastic transformation related to IPMN. We realize that this is a pilot study and it needs to be validated by future studies but this is the first study which reveals clearly increasing and decreasing expression trends from healthy-LG-HG-IPMNC and also provides glimpses of altered pathways.

Conclusions
In conclusion, we propose several biomarker panels ( Table 2) as means to differentiate healthy controls from LG, HG IPMN and also IPMN associated carcinoma. LG can also be differentiated from IPMNC which is clinically very useful and might aid clinicians in differential diagnosis in the future. In addition we identify the serum proteins showing a trend in increase or decrease in grade-specific manner. This study works as precursor for further studies to streamline the validation of these biomarkers and probe the role of serum factors in tumorigenesis of pancreatic cancer starting from low grade IPMNs eventually progressing to IPMN associated carcinoma. It will be helpful to further study the serum proteins which can reflect the transformation from healthy to LG IPMN to HG IPMN to IPMNC.

Additional files
Additional file 1: Figure S1. Additional file 2 Table S1A. Patient metadata. Table S1B. List of all proteins quantified. Table S1C. List of proteins for every binary comparison. Table S2. S-Plot proteins of CTRL vs LG. Table S3. S-Plot proteins of CTRL vs HG. Table S4. S-Plot proteins of CTRL vs IPMNC.