|Unit Name||BIG DATA ANALYSIS AND PATTERN RECOGNITION|
|Unit Duration||24 week|
Doctor of Engineering
Duration 3 years
|Unit Creator / Reviewer||Dr Tony Auditore|
Total Program Credit Points 120
|Mode of Delivery||Online or on-campus.|
20 hours per fortnight:
Lecture - 1 hour
Tutorial - 1 hour
Assessments / Practical / Lab - 2 hours (where applicable)Personal Study recommended - 16 hours (guided and unguided)
Unit Description and General Aims
Large amounts of data (‘big data’) in the form of digital records, have the potential to be transformed into useful research input information. That transformation can be realized through the application of computer analysis. Cognitive computers can process huge amounts of data; but are less effective in judgmental decisions. Big data analysis is a relatively new discipline which interfaces with statistics, database technology, and pattern recognition. It is concerned with the analysis of large databases to find relationships which are of interest or of value to researchers. Due to the sheer size of the data involved, new challenges arise for pattern recognition.
This unit imparts to candidate’s procedural skills for researching massive amounts of data to identify patterns and correlations referred to as ‘big data’ analysis.
At the conclusion of this unit students should be able to: identify the significance of big data to emerging interdisciplinary research; comprehend the big data analytic process; have an awareness of the challenges within big data analytic; understand the development and techniques for the design of systems capable of performing a given recognition task for a specific research application; have an awareness of the fundamental problems in pattern recognition system design; interpret large data sets and develop pattern recognition conclusions and recommendations.
On successful completion of this Unit, students are expected to be able to:
- Evaluate the significance of big data to emerging interdisciplinary research and prediction of the future.
Bloom’s Level 5
- Justify the big data analytic process by defining the research requirement and taking an integrated solution approach.
Bloom’s Level 5
- Evaluate the challenges within big data analytic (data complexity; computational complexity; and system complexity).
Bloom’s Level 5
- Hypothesise the development and techniques for the design of systems capable of performing a given recognition task for a specific research application.
Bloom’s Level 6
- Express the fundamental problems in pattern recognition system design (representation of input data; pre-processing and feature extraction problem; and determination of optimum decision procedures).
Bloom’s Level 3
- Infer and interpret sources of big data and develop pattern recognition conclusions and recommendations.
Bloom’s Level 2
The cognitive domain levels of Bloom’s Taxonomy:
|Bloom's level||Bloom's category||Description|
|1||Remember||Retrieve relevant knowledge from long-term memory by recognising, identifying, recalling and retrieving.|
|2||Understand||Construct meaning from instructional messages by interpreting, classifying, summarising, inferring, comparing, contrasting, mapping and explaining.|
|3||Apply||Carrying out or using a procedure in a given situation by executing, implementing, operating, developing, illustrating, practicing and demonstrating.|
|4||Analyse||Deconstruct material and determine how the parts relate to one another and to an overall structure or purpose by differentiating, organising and attributing.|
|5||Evaluate||Make judgments based on criteria and standards by checking, coordinating, evaluating, recommending, validating, testing, critiquing and judging.|
|6||create||Put elements together to form a coherent pattern or functional whole by generating, hypothesising, designing, planning, producing and constructing.|
|Assessment Type||When assessed (eg. After Topic 5)||Weighting (% of total unit marks)||Learning Outcomes Assessed|
Type: Multi-Choice Test
Word length: n/aQuestions from the content covered over the first weeks of instruction.
|After topic 4||10%||1, 2|
Type: Data analysis application / Self-assessment
Word length: 1500A ‘self-originated’ question by the student to be answered within the boundaries, scope and study reference material provided by the examiner.
|Due after Topic 9||35%||3,4|
Type: Design plan
Word length: 3000From the Industry Project DEng700 or a data source stipulated by the lecturer, develop a clear and concise description of the big data and pattern recognition model. Identify the important factors that affect the application of the model and develop mitigating solutions to overcome possible adverse conditions
Prescribed and Recommended Readings
The required text book provides important references in each chapter which are relevant to the subject matter. These references and those provided by the instructor will form the basis of the study material. The following textbook provides a study guide, and a student’s future reference book for statistical theory, numerous research methods, calculations and visuals.
- Li, K-C. Jiang, H. Yang, L. T. Cuzzocrea (2015) Big Data: Algorithms, Analytics, and Applications. CRC Press. ISBN: 978-1-4822-4055-9
In addition to the above textbook, there are several useful related reference materials which may be obtained on-line from published journals, and websites. These resources may be obtainable from www.academia.edureputable research databases,e.g. Scopus, Web of Science, SAGE .
Software Reference Material
Software can be applied in the processing of data and the professional presentation of computed results. There are numerous software packages which can be applied. For convenience and affordability, the Office .xls ‘add-on’ software XLSTAT-Base is proposed.
- The proposed XLSTAT-Base solution software is for data mining, machine learning, tests, data modelling and visualization. This software tool can be applied for data preparation and visualization, parametric and nonparametric tests, modelling methods (ANOVA, regression, generalized linear models, mixed models, nonlinear models), data mining features (principal component analysis, correspondence analysis) and clustering methods (Agglomerative Hierarchical Clustering, K-means). XLSTAT-Base also features machine learning methods (association rules, regression and classification trees and K-Nearest Neighbours), partial least square regression and more. It is IET’s viewpoint that XLSTST-Base will be a comprehensive and affordable research tool for the candidate’s final research project. Further reading can be obtained from website: (https://www.xlstat.com/en/solutions/base).
- Alternative software may be applied on smaller case studies such as Maple, Quantum XL, MATLAB.
Topic 1 and 2
Students will be introduced to the subject matter of big data analytics by identifying the significance of big data to emerging interdisciplinary research and prediction of the future. Students will comprehend the big data analytic process by: clarifying and defining the research requirement; work efficiently with the data; a top-down data management approach adopted; and the goal achieved by an integrated solution approach. Content will cover relevant sections of the first 4 chapters relating to Big Data Management’’ of the prescribed book which includes:
- Scalable Indexing for Big Data Processing.
- Scalability and Cost Evaluation of Incremental Data Processing Using Amazon’s Hadoop Service.
- Singular Value Decomposition, Clustering, and Indexing for Similarity Search for Large Data Sets in High-Dimensional Spaces.
- Multiple Sequence Alignment and Clustering with Dot Matrices, Entropy, and Genetic Algorithms.
Topic 3 and 4
Students will be introduced to the processing of big data. Content will cover relevant sections of the chapters 5 to 10 relating to “Big Data Processing’’ of the prescribed book which includes:
- Approaches for High-Performance Big Data Processing: Applications and Challenges.
- The Art of Scheduling for Big Data Science.
- Time–Space Scheduling in the MapReduce Framework.
- GEMS: Graph Database Engine for Multithreaded Systems.
- KSC-net: Community Detection for Big Data Networks.
- Making Big Data Transparent to the Software Developers’ Community.
Topic 5 and 6
Students will be introduced to the stream techniques and algorithms of big data. Content will cover relevant sections of the chapters 11 to 14 relating to “Big Data Stream Techniques and Algorithms’’ of the prescribed book which includes:
- Key Technologies for Big Data Stream Computing.
- Streaming Algorithms for Big Data Processing on Multicore Architecture.
- Organic Streams: A Unified Framework for Personal Big Data Integration and Organization towards Social Sharing and Individualised Sustainable Use.
- Managing Big Trajectory Data: Online Processing of Positional Streams.
Topic 7 and 8
Students will be introduced to the big data privacy. Content will cover relevant sections of the chapters 15 to 16 relating to “Big Data Stream Techniques and Algorithms’’ of the prescribed book which includes:
- Personal Data Protection Aspects of Big Data.
- Privacy-Preserving Big Data Management: The Case of OLAP
Topic 9 and 10
Students will be introduced to the big data applications. Content will cover relevant sections of the chapters 17 to 20 relating to “Big Data Stream Techniques and Algorithms’’ of the prescribed book which includes:
- Big Data in Finance.
- Semantic-Based Heterogeneous Multimedia Big Data Retrieval.
- Topic Modelling for Large-Scale Multimedia Analysis and Retrieval.
- Big Data Biometrics Processing: A Case Study of an Iris Matching Algorithm on Intel Xeon Phi.
Topics 11 and 12
Big data applications will be continued. Content will cover relevant sections of the chapters 21 to 22 relating to “Big Data Stream Techniques and Algorithms’’ of the prescribed book which includes:
- Storing, Managing, and Analysing Big Satellite Data: Experiences and Lessons Learned from Real-World Application.
- Barriers to the Adoption of Big Data Applications in the Social Sector.
An opportunity will be provided for a review of all work and to clarify outstanding issues. Instructors/facilitators may choose to focus on a specific area(s) of the unit.
- Software: N/A
- Version: N/A
- Instructions: N/A
- Additional resources or files: N/A