为了更好帮助企业深入了解国内外最新大数据技术,掌握更多行业大数据实践经验,进一步推进大数据技术创新、行业应用和人才培养,2015年12月10-12日,由中国计算机学会(CCF)主办,CCF大数据专家委员会承办,中国科学院计算技术研究所与ImapBox共同协办的2015中国大数据技术大会(Big Data Technology Conference 2015,BDTC 2015)将在北京新云南皇冠假日酒店隆重举办。 BDTC 2015将为期三天,在大会主会之外,拟设立16个分论坛,包括数据库、深度学习、推荐系统、安全等6大技术论坛,金融、制造业、交通旅游、互联网、医疗健康、教育、网络通讯等7大应用论坛,以及政策法规和标准化、数据市场及交易、社会治理等3大热点议题论坛,将邀请近100位国外大数据技术领域顶尖专家与一线实践者,深入讨论Spark、Kudu、PostgreSQL、YARN、HBase、机器学习/深度学习、推荐系统等热门技术及行业实践。 本次大会请到了阿里巴巴iDST(数据科学与技术研究院)负责人之一,美国密歇根州立大学终身教授金榕担任全体大会演讲嘉宾,发表题为“Randomized Algorithms for Big Data: Making the Impossible Possible”的主题演讲。 在大会开始之前,金榕在接受ImapBox记者采访时表示,由于具有较高的计算效率,随机机器学习算法在近年的机器学习研究中受到广泛关注。但是,由于随机算法固有的局限性,随机机器学习算法在很多学习任务中并不能非常有效地利用大规模数据(阿里巴巴的电子商务平台每天收到数以10亿计的服务请求)。他将在大会上基于两个例子将介绍如何利用辅助信息(side information) 和先验知识(prior knowledge)克服随机机器学习算法的局限性,只需对进行微小的修改,就可以极大地提高随机机器学习算法的有效性。同时,他也会介绍随机机器学习方法在阿里巴巴的成功应用案例。 阿里巴巴iDST(数据科学与技术研究院)负责人之一,美国密歇根州立大学终身教授 金榕教授拥有美国卡内基梅隆大学博士学位,长期致力于统计机器学习研究,重点关注大数据分析及其在互联网信息检索、电子商务等领域中的应用,在随机优化、在线学习、核学习、度量学习、半监督学习、主动学习和众包等领域提出了一系列原创算法和理论。金榕教授共发表200多篇国际会议和期刊论文,在本领域的顶级期刊如JMLR、TPAMI、PNAS上发表论文32篇,在本领域的顶级国际会议如ICML、NIPS、COLT上发表论文147篇,研究成果他引10,000余次。曾担任NIPS、SIGIR等顶级国际会议领域主席,KDD、AAAI、IJCAI等顶级会议高级程序委员会委员。金榕教授获得过美国国家科学基金会NSF Career Award。 ImapBox:请介绍一下您公司的业务,大数据对公司业务的价值,以及您部门的职责。 金榕:The goal of our BU is to develop state-of-the-art machine learning and data mining algorithms to support the key technologies of Alibaba including search, recommendation, business data analysis, and sales forecasting. ImapBox:能否介绍您在项目实施中曾使用过哪些大数据技术?您对这些技术满意的地方和不满意的地方分别有什么? 金榕:The key technologies we have utilized are large-scale optimization and machine learning. Although numerous efforts are devoted to large-scale optimization and machine learning, they are limited in two aspects: first, most efforts are devoted to developing computing infrastructure for large-scale optimization; second, most algorithm are unable to handle large-scale and high dimension data simultaneously; third, most machine learning algorithms are unable to deal with noisy data effectively, which is quite common in industry. ImapBox:能否谈谈大数据在您的行业落地目前主要遇到哪些挑战? 金榕:The key challenge for individual developers is lack of computing resources. Currently, Alibaba has offered the general public the powerful distributed environment that makes it possible for individual developers to perform large-scale data analysis. In particular, this platform has offered powerful tools for large-scale optimization and large-scale machine learning. ImapBox:根据您的了解,企业容易犯哪些错误导致大数据实践的失败? 金榕:A common mistake that I have observed is to infer causal relations from noisy data. For instance, we may found from the estimated conditional probabilities that male clients are more likely to search for female products than the female clients. We late on found that it is due to the fact that many Taobao accounts are owned jointly by couples and for some reason, and only the males were listed as the owners of the accounts. ImapBox:大数据领域的新技术发展很快,从整个大数据产业来说,您认为哪些技术趋势值得关注? 金榕:金榕
以下为金榕教授采访实录:
技术实践
技术趋势
ImapBox:针对您所在的行业,哪些技术是您目前主要观察和研究的,您为什么看好这些技术?
金榕:Large-scale deep learning, or in general learning non-linear prediction functions from massive amount of data.
ImapBox:人才与大数据项目的成功直接相关,您在大数据人才团队的建设方面有什么经验可以分享?
金榕:To build a strong data science team, the key is to include people of good business understanding with people with solid background in machine learning and data mining.
ImapBox:您认为优秀的数据科学家需要哪些素质?
金榕:I noticed that although many fresh graduates have received good education on machine learning and data mining, they are not good at problem solving particularly when encountering unexpected difficulties. To be a good data scientist, he/she should be able to find out the source of problems, particularly when the data to be analyzed is complex and seems to reveal conclusions that may be conflicting.
ImapBox:请谈谈您在这次大会上即将分享的话题。
金榕:Exploit randomized algorithms for large-scale optimization.
We are continuing to encounter an explosive growth in data: the number of web pages grows from 300 million in 1997 to 50 billion in 2013; about 10 billion images are indexed by Google and 6 billion videos are indexed by YouTube; Alibaba’s ecommerce platform receives billions of requests on a daily basis. This data explosion poses a great challenge in data analysis. Randomized algorithms have attracted significant interests in the recent studies of machine learning, mostly due to its computational efficiency. But, on the other hand, the formal limitations of randomized algorithms have been established for various learning tasks, making them less effective in exploiting the massive amount of data that is available to computer programs. In this talk, I will discuss, based on two examples, how to overcome the limitation of randomized machine learning algorithms by exploiting either the side information or prior knowledge of data. We have shown, both theoretically and empirically, that with a slight modification, it is possible to dramatically improve the effectiveness of randomized algorithms for machine learning. I will also introduce the successful cases of applying randomized algorithms in Alibaba.
ImapBox:哪些听众最应该了解这些话题?您所分享的主题可以帮助听众解决哪些问题?
金榕:Any audience interested in large-scale learning will be interested in this topic. The materials presented in my talk will help people find ways to solve large scale optimization problems without having to resolve to distributed computing environment.
ImapBox:能否谈谈您对BDTC2015、其他的讲师分享的话题有什么期待?
金榕:Would love to know the topics of big data in other subjects.
第九届中国大数据技术大会将于2015年12月10-12日在北京隆重举办。在主会之外,会议还设立了16大分论坛,包含数据库、深度学习、推荐系统、安全等6大技术论坛,金融、制造业、交通旅游、互联网、医疗健康、教育等7大应用论坛和3大热点议题论坛,票价折扣中预购从速。
本文为ImapBox原创文章,未经允许不得转载,如需转载请联系market#csdn.net(#换成@)
本网页所有文字内容由 imapbox邮箱云存储,邮箱网盘, iurlBox网页地址收藏管理器 下载并得到。
ImapBox 邮箱网盘 工具地址: https://www.imapbox.com/download/ImapBox.5.5.1_Build20141205_CHS_Bit32.exe
PC6下载站地址:PC6下载站分流下载
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox 网页视频 工具地址: https://www.imapbox.com/download/ImovieBox4.7.0_Build20141115_CHS.exe
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算