通知公告
网站首页 >  通知公告
萨师煊大数据分析与管理国际研究中心第一届大数据分析与管理国际研讨会

发布日期:2012-07-13  访问量:

2012719北京

21世纪,IT技术突飞猛进,世界进入到大数据时代。为了有效应对大数据带来的挑战,同时充分利用大数据带来的机遇,国际学术界、企业界、甚至政府都在积极提出应对措施、制定战略规划。例如,超大数据库国际会议XLDB在美国、欧洲等地已经连续举办五次,美国政府投资2亿美元设立"大数据研究与发展计划"等。

正是在这样的背景下,中国人民大学依托数据工程与知识工程教育部重点实验室设立了萨师煊大数据分析与管理国际研究中心,并将于今年7月19日在中国人民大学逸夫会议中心举办"第一届大数据分析与管理国际研讨会",围绕不同领域中大数据应用的挑战、大数据的基础设施构建等议题进行深入探讨,以推进大数据的研究和发展。

时间:2012年7月19日9:00-17:00

会议地点:北京市海淀区中国人民大学逸夫会议中心第一报告厅

日程安排

时间

讲者

题目

8:00-9:00

Registration

KEYNOTE TALK

SESSION CHAIR: Xiaofang Zhou

9:00-9:45

Ooi Beng Chin

Design and Development of an IT Infrastructure for Data-intensive Applications and Analysis

9:45-10:15

Coffee Break

INVITED TALK

SESSION CHAIR: Cuiping Li

10:15-10:45

Christian S.Jensen

Managing High-Velocity Mobile Location Data

10:45-11:15

Ling Liu

From Trajectory Mining to Big Data Analytics

11:15-11:45

Zhenkun Yang

Building a scalable RDBMS from scratch

11:45-12:15

Lei Chen

Whom to Ask? Jury Selection for Decision Making Tasks on Microblog Services

12:15-14:00

Lunch

PROJECT REPORT

SESSION CHAIR: Jiaheng Lu

14:00-14:30

Jianmin Wang

Research Progress on Unstructured Data Management in China

14:30-15:00

Xiaofang Zhou

Research project on Web information extraction, analysis and management

15:00-15:30

Coffee Break

PANEL DISCUSSION

SESSION CHAIR: Ling Liu

15:30-17:00

Weiyi Meng

Big Data Analysis and Management:

Opportunities and Challenges

Christian S.Jensen

Ling Liu

Ke Wang

Chengqi Zhang
Zhenkun Yang
Jianmin Wang
Xiaofang Zhou
Lei Chen

学术专题讲座详细信息

 


 

学术讲座1:Design and Development of an IT Infrastructure for Data-intensive Applications and Analysis

 


  摘要: Web 2.0
  applications and enterprise applications are generating massive amounts of
  different types of data at an unprecedented scale, and \emph{big data} problem
  represents a new emerging challenge that has crippled the IT infrastructures of
  many modern enterprises. With the fast popularity of cloud computing for its
  promises of elastically scalable and pay-per-use model, IT infrastructures in
  the future are likely to move from the enterprise settings to the cloud.
  However, although cloud computing has demonstrated its usefulness in the context
  of Web-centric information-oriented applications, it has not reached the level
  of maturity to support a large class of enterprise applications that rely on a
  foundational data and information layer that is based on Database Management
  Systems (DBMSes). In this talk, I shall discuss our effort in building a cloud
  infrastructure for big data management, processing and analytics driven by two
  application domains, namely, social networking and enterprise data
  management.


讲者简介: Prof. Ooi
  BengChin, Dean of School of Computing (SoC), and Director of
  Interactive Digital Media Institute (IDMI) at the National University of
  Singapore (NUS). Beng Chin has served as a PC member for international
  conferences such as ACM SIGMOD, VLDB, IEEE ICDE, WWW, and SIGKDD, and as Vice PC
  Chair for ICDE`00,04,06, co-PC Chair for SSD`93 and DASFAA`05, PC Chair for ACM
  SIGMOD`07, Core DB PC chair for VLDB`08, and PC co-Chair for IEEE ICDE`12. He
  was an editor of VLDB Journal and IEEE Transactions on Knowledge and Data
  Engineering, and as a co-chair of the ACM SIGMOD Jim Gray Best Thesis Award
  committee. He is serving as the Editor-in-Chief of IEEE Transactions on
  Knowledge and Data Engineering (TKDE)(2009-2012), an editor of Distributed and
  Parallel Databases Journal, an advisory board member of SIGMOD, and a trustee
  board member and executive of VLDB Endowment.   学术讲座2:Managing High-Velocity Mobile
  Location Data 摘要:
  Deployments of networked sensors enable online applications that feed on
  real-time sensor data. For example, a variety of applications may exploit a
  database of up-to-date locations of very large populations of mobile users. This
scenario calls for techniques that support the management of workloads that contain queries with low
  latency requirements as well as massive volumes of updates. Such techniques
  should exploit the parallelism offered by modern processors. To do so, it is
  essential to avoid contention among parallel hardware threads.This talk covers
  two solutions, TwingGrid and PGrid. TwinGrid maintains two copies, or snapshots,
  of the data: one for the relatively long-duration queries and one for the
  frequent and very localized updates.The snapshot that receives the updates is
  frequently made available to queries by means of the C library memcpy function.
  PGrid avoids keeping two copies, but instead relaxes the query semantics so that
  updates and queries can occur in parallel. By maintaining two copies of data
  updated with non-local updates, so-called freshness semantics can be guaranteed.
  Both solutions use spatial grid indexes and use secondary hash indexes on object
identifiers.  讲者简介: Prof. Christian S. Jensen is a Professor of Computer Science at Aarhus
  University, Denmark. He was previously at Aalborg University for two decades,
  and he recently spent a 1-year sabbatical at Google Inc., Mountain View. His
  research concerns data management and data-intensive systems, and its focus is
  on temporal and spatio-temporal data management. Christian is an ACM and an IEEE
  fellow, and he is a member of the Royal Danish Academy of Sciences and Letters
  and the Danish Academy of Technical Sciences. He has received several national
  and international awards for his research. He is currently vice-chair of ACM
  SIGMOD and an editor-in-chief of The VLDB Journal.  学术讲座3: From Trajectory Mining to Big Data
  Analytics 摘要:A trajectory can be defined as a
  time-ordered set of states of a dynamical system. The most common example of
  trajectory is the path that a moving object follows through space as a function
  of time. A trajectory can be described mathematically either by the geometry of
  the path, or as the position of the object over time. Mining moving object
  trajectory data has been gaining significant interest in recent years. However,
  existing approaches to trajectory clustering are mainly based on density and
  Euclidean distance measures. We argue that when the utility of spatial
  clustering of mobile object trajectories is targeted at road network aware
  location based applications, density and Euclidean distance are no longer the
  effective measures. This is because traffic flows in a road network and the
  flow-based density characterization become important factors for finding
  interesting trajectory clusters of mobile objects travelling in road networks.
  In this talk, I will briefly introduce our research project NEAT−a fast and
  effective approach to clustering of spatial trajectories of moving objects
  travelling on road networks. Our method takes into account the physical
  constraints of the road network, the network proximity and the traffic flows
  among consecutive road segments. Trajectory clusters discovered by NEAT are
  groups of sub-trajectories that describe both dense and highly continuous
  traffic flows of mobile objects. Our experimental results with mobility traces
  generated using different scales of real road network maps demonstrate that the
  NEAT approach is highly accurate and runs orders of magnitude faster than
  existing Euclidean density-based trajectory clustering approaches. I will
  conclude the talk by discussing moving objet trajectory mining in information
  networks, such as consumer networks, healthcare information networks, social
  networks. 讲者简介: Prof.
  Ling Liu is a full Professor in the School of Computer Science at Georgia
  Institute of Technology. She directs the research programs in Distributed Data
  Intensive Systems Lab (DiSL), examining various aspects of large scale data
  intensive systems with the focus on performance, availability,security, privacy,
  and energy efficiency. Prof. Liu and her students have released a number of open
  source software tools, including WebCQ, XWRAP,PeerCrawl, GTMobiSim. She has
  published over 300 International journal and conference articles in the areas of
  databases, distributed systems, and Internet Computing. Prof. Liu is a recipient
  of 2012 IEEE Computer Society Technical Achievement Award and an Outstanding
  Doctoral Thesis Advisor award from Georgia Institute of Technology.She has also
  served as general chair and PC chairs of several IEEE and ACM conferences in
  data engineering and distributed computing fields and served on editorial board
  of over a dozen international journals.Currently Prof. Liu is on the editorial
  board of Distributed and Parallel Databases (Springer), Journal of Parallel and
  Distributed Computing (JPDC), IEEE Transactions on Service Computing (TSC), and
  ACM Transactions on Web (TWEB). Dr. Liu`s current research is primarily
sponsored by NSF, IBM, and Intel.  学术讲座4:Building a
  scalable RDBMS from scratch 摘要: Even if big data of petabytes or even exabytes are
    attracting more and more eyes, RDBMS is still the BASE of our society. In this
    talk, I will share with you OceanBase ( http://oceanbase.taobao.org/ ), a
    scalable RDBMS built from scratch. OceanBase is a semi-distributed storage
    system for managing structured data, supporting transaction (ACID) as well as
    many features of the relational model. Being a share nothing architecture, it
    can easily scale to hundreds of billions of records across hundreds of commodity
    servers on-the-fly with fault tolerance. By eliminating random disk write, it
    matches commodity solid state disk (SSD) perfectly and thus enables much higher
    transaction per second (TPS) and query per second (QPS). OceanBase has provided
    relational database services for more than a dozen projects in the product
system of Taobao.com and servers billions of real time queries every day.  讲者简介: Dr. Zhenkun YANG (yangzhenkun@gmail.com) is a
  Senior Researcher with Taobao.com. In recent years, his research interests are
  distributed storage and computing system. He is now the chief architect of
  OceanBase, an open source scalable relational database at Taobao.com. Before
  joined Taobao.com, he has been a Senior Scientist with Baidu.com, a Lead
  Researcher with Microsoft Research Asia and a Chief Researcher with Lenovo
  Research. He received his bachelor and master degrees from the Department of
  Mathematics, Peking University. After he got his PhD degree from the Department
  of Computer Science in 1993, he became a faculty of the Institute of Computer
  Science and Technology, Peking University and a full professor in 1997. He
  received the Cheung Kong Scholar Award, Peking University in 1999. He was the
  4th person in the First Class Award of the National Science and Technology
  Progress of China in 1995. He also won the First Class Award of Science and
  Technology Progress of Beijing in 1996, National Youth Science and Technology
  Award of China in 1998, Qiushi Eminent of the Chinese Academy of Science and
  Technology in 1998, and Wusi Youth Award of Beijing in 2000.   学术讲座5: Whom to
    Ask? Jury Selection for Decision Making Tasks on Microblog Services 

  摘要: It is universal to see people obtain knowledge
  on micro-blog services by asking others decision making questions. In this talk,
  I will present our recent study on the Jury Selection Problem(JSP) by utilizing
  crowdsourcing for decision making tasks on micro-blog services. Specifically,
  the problem is to enroll a subset of crowd under a limited budget, whose
  aggregated wisdom via Majority Voting scheme has the lowest probability of
  drawing a wrong answer (Jury Error Rate-JER). The challenges of such problem
  reside in the procedure of calculating JER and finding the optimal subset under
  a limited budget. Due to various individual error-rates of the crowd, the
  calculation of JER is non-trivial. In our study, we propose two efficient
  algorithms: a dynamic programming-based algorithm and a divide-and-conquer
  algorithm. For JSP, we formally propose two models, one for altruistic
  users(AltrM) and the other one for incentive-requiring users(PayM) who require
  extra payment when enrolled into a task. Based on two models, we design
  efficient algorithms for JSP. The efficiency and effectiveness of our proposed
  algorithms are verified on both synthetic and real micro-blog data.
 讲者简介:
Prof. Lei Chen received the BS degree in computer science and
engineering from Tianjin University, Tianjin, China, in 1994,
the MA degree from Asian Institute of Technology, Bangkok,
Thailand, in 1997, and the PhD degree in computer science from
the University of Waterloo, Waterloo, Ontario, Canada, in 2005.
He is currently an associate professor in the Department of
Computer Science and Engineering, Hong Kong University of
Science and Technology. His research interests include crowd
sourcing on social media, social media analysis, probabilistic
and uncertain databases, and privacy-preserved data publishing.
So far, he published more than 150 conference and journal papers.
He got the best paper awards in DASFAA 2009 and 2010. He is PC
Track chairs for ACM SIGMM 2011, ACM CIKM 2012, and IEEE ICDE
2012. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM,
and WWW. He is a member of the ACM and IEEE. He also serves as
the chairman of ACM Hong Kong Chapter.
学术讲座6: Research Progress on Unstructured Data Management in China
摘要: In 2010,
  under the national theme of so called `Core Electronic Devices, Advanced
  Chipsets and Fundamental Software Products(Abbreviated as HGJ in Chinese)`,
  China has started supporting research and development activities on unstructured
  data management system. This report overviews these HGJ projects during the
  11th-five-year-plan period. In particular, we focus on large-scale, unstructured
  data management systems, especially the research progress on their elastic
  system architecture, flexible transaction mechanism and behavior data
  mining. 讲者简介: Prof.
  Jianmin Wang, received the Ph.D in Computer Software from Tsinghua
  University In 1995. He is now a professor and doctor supervisor in School of
  Software, Tsinghua University. He has been a member of the experts group for
  National "HGJ" major science and technology project of China, and a member of
  the experts group of National 863 program of China. He has been engaged in the research work of data
  management and information systems, including the topics on unstructured data
  management, business process management, product data management, benchmarks and
  frameworks for database evaluation, novel watermarking and digital rights
  management and so on.  学术讲座7:Research
  project on Web information extraction, analysis and management 摘要: In China, we currently
    launch a new 863 project called "Big Web information extraction, analysis and
    management in an open environment" . In this project, we will address the
    challenge for the Web information extraction, analysis and management involving
    Internet-scale data. This project will last three years and is expected to do
    some real applications with Petabyte-scale data. 讲者简介: Professor Xiaofang Zhou is a Professor of Computer Science at
  the University of Queensland, and Head of Data Engineering and Pattern
  Recognition Research Division at UQ,. His research focus is to find effective
  and efficient solutions for managing, integrating and analyzing very large
  amount of complex data for business, scientific and personal applications. He
  has been working in the area of spatial and multimedia databases, data quality,
  high performance query processing, Web information systems and bioinformatics,
  co-authored over 200 research papers with many published in top journals and
  conferences such as SIGMOD, VLDB , ICDE, ACM Multimedia, The VLDB Journal, ACM
  Transactions and IEEE Transactions. Xiaofang is an Adjunct Professor of Renmin
  University of China appointed under the Chinese National Qianren Scheme, and
  serves as the Director of RUC-UQ Joint Lab on Data Engineering and Knowledge
  Engineering.