报告人: He Wenqing 教授
题目: Data Adaptive Support Vector Machine with Application to Prostate Cancer Imaging Data
摘要: Support vector machines (SVM) have been widely used as classifiers in various settings including pattern recognition, texture mining and image retrieval. However, such methods are faced with newly emerging challenges such as imbalanced observations and noise data. In this talk, I will discuss the impact of noise data and imbalanced observations on SVM classification and present a new data adaptive SVM classification method.
This work is motivated by a prostate cancer imaging study conducted in London Health Science Center. A primary objective of this study is to improve prostate cancer diagnosis and thereby to guide the treatment based on statistical predictive models. The prostate imaging data, however, are quite imbalanced in that the majority voxels are cancer-free while only a very small portion of voxels are cancerous. This issue makes the available SVM classifiers typically skew to one class and thus generate invalid results. Our proposed SVM method uses a data adaptive kernel to reflect the feature of imbalanced observations; the proposed method takes into consideration of the location of support vectors in the feature space and thereby generates more accurate classification results. The performance of the proposed method is compared with existing methods using numerical studies.
报告人: Yi Yun 教授
题目：Making Sense of Noisy Data: Some Issues and Methods
摘要：Thanks to the advancement of modern technology in acquiring data, massive data with diverse features and big volume are becoming more accessible than ever. The impact of big data is signicant. While the abundant volume of data presents great opportunities for researchers to extract useful information for new knowledge gain and sensible decision making, big data present great challenges. A very important, sometimes overlooked challenge is the quality and provenance of the data. Big data are not automatically useful; big data are often raw and involve considerable noise.
Typically, the challenges presented by noisy data with measurement error, missing observations and high dimensionality are particularly intriguing. Noisy data with these features arise ubiquitously from various fields including health sciences, epidemiological studies, environmental studies, survey research, economics, and so on. In this talk, I will discuss the issues induced from noisy data and some methods of handling such data.