Last Updated | S022020 |
DEng 803
Unit Name | BIG DATA ANALYSIS AND PATTERN RECOGNITION |
Unit Code | DENG803 |
Unit Duration | 24 week |
Award |
Doctor of Engineering Duration 3 years |
Year Level | Two |
Unit Coordinator | Dr Akhlaqur Rahman |
Core/Sub-Discipline: | Core |
Pre/Co-requisites | DEng 801 |
Credit Points |
8
Total Program Credit Points 120 |
Mode of Delivery | Online or on-campus. |
Unit Workload |
20 hours per fortnight: Lecture - 1 hour Tutorial - 1 hour Assessments / Practical / Lab - 2 hours (where applicable) Personal Study recommended - 16 hours (guided and unguided) |
Unit Description and General Aims
Large amounts of data (‘big data’) in the form of digital records, have the potential to be transformed into useful research input information. That transformation can be realized through the application of computer analysis. Cognitive computers can process huge amounts of data; but are less effective in judgmental decisions. Big data analysis is a relatively new discipline which interfaces with statistics, database technology, and pattern recognition. It is concerned with the analysis of large databases to find relationships which are of interest or of value to researchers. Due to the sheer size of the data involved, new challenges arise for pattern recognition.
This unit imparts to candidate’s procedural skills for researching massive amounts of data to identify patterns and correlations referred to as ‘big data’ analysis.
At the conclusion of this unit students should be able to: identify the significance of big data to emerging interdisciplinary research; comprehend the big data analytic process; have an awareness of the challenges within big data analytic; understand the development and techniques for the design of systems capable of performing a given recognition task for a specific research application; have an awareness of the fundamental problems in pattern recognition system design; interpret large data sets and develop pattern recognition conclusions and recommendations.
Learning Outcomes
On successful completion of this Unit, students are expected to be able to:
- Evaluate the significance of big data to emerging interdisciplinary research and prediction of the future.
Bloom’s Level 5
- Justify the big data analytic process by defining the research requirement and taking an integrated solution approach.
Bloom’s Level 5
- Evaluate the challenges within big data analytic (data complexity; computational complexity; and system complexity).
Bloom’s Level 5
- Hypothesise the development and techniques for the design of systems capable of performing a given recognition task for a specific research application.
Bloom’s Level 6
- Express the fundamental problems in pattern recognition system design (representation of input data; pre-processing and feature extraction problem; and determination of optimum decision procedures).
Bloom’s Level 3
- Infer and interpret sources of big data and develop pattern recognition conclusions and recommendations.
Bloom’s Level 2
Student assessment
Assessment Type | When assessed (eg. After Topic 5) | Weighting (% of total unit marks) | Learning Outcomes Assessed |
Assessment 1 Type: Multi-Choice Test Word length: n/a Questions from the content covered over the first weeks of instruction. |
After topic 4 | 15% | 1, 2 |
Assessment 2 Type: Data analysis application Word length: 2000 A ‘self-originated’ question by the student to be answered within the boundaries, scope and study reference material provided by the examiner. |
Due after Topic 9 | 35% | 3,4 |
Assessment 3 Type: Design plan Word length: 4000 From the Industry Project DEng700 or a data source stipulated by the lecturer, develop a clear and concise description of the big data and pattern recognition model. Identify the important factors that affect the application of the model and develop mitigating solutions to overcome possible adverse conditions |
Final week | 50% | 1-6 |
Prescribed and Recommended Readings
Required Textbook(s)
The required text book provides important references in each chapter which are relevant to the subject matter. These references and those provided by the instructor will form the basis of the study material. The following textbook provides a study guide, and a student’s future reference book for statistical theory, numerous research methods, calculations and visuals.
- Li, K-C. Jiang, H. Yang, L. T. Cuzzocrea (2015) Big Data: Algorithms, Analytics, and Applications. CRC Press. ISBN: 978-1-4822-4055-9
Reference Materials
In addition to the above textbook, there are several useful related reference materials which may be obtained on-line from published journals, and websites. These resources may be obtainable from Scopus, Web of Science, SAGE.
Software Reference Material
Software can be applied in the processing of data and the professional presentation of computed results. There are numerous software packages which can be applied. For convenience and affordability, the Office .xls ‘add-on’ software XLSTAT-Base is proposed.
- The proposed XLSTAT-Base solution software is for data mining, machine learning, tests, data modelling and visualization. This software tool can be applied for data preparation and visualization, parametric and nonparametric tests, modelling methods (ANOVA, regression, generalized linear models, mixed models, nonlinear models), data mining features (principal component analysis, correspondence analysis) and clustering methods (Agglomerative Hierarchical Clustering, K-means). XLSTAT-Base also features machine learning methods (association rules, regression and classification trees and K-Nearest Neighbours), partial least square regression and more. It is IET’s viewpoint that XLSTST-Base will be a comprehensive and affordable research tool for the candidate’s final research project. Further reading can be obtained from website: (https://www.xlstat.com/en/solutions/base).
- Alternative software may be applied on smaller case studies such as Maple, Quantum XL, MATLAB.
Unit Content
Topics 1, 2, and 3
Students will be introduced to the subject matter of big data analytics by identifying the significance of big data to emerging interdisciplinary research and prediction of the future. Students will review the concept of data analytics and the emergence of big data. Big data drivers and processes are also reviewed.
- Data acquisition, pre-processing, visualisation.
- Data mining algorithms.
- R for big data.
- Big data business drivers and processes.
Topics 4 and 5
Students will be introduced to scalability in big data management. Content will cover relevant sections of the first 4 chapters relating to Big Data Management’’ of the prescribed book which includes:
- Scalable Indexing for Big Data Processing.
- Scalability and Cost Evaluation of Incremental Data Processing Using Amazon’s Hadoop Service.
- Singular Value Decomposition, Clustering, and Indexing for Similarity Search for Large Data Sets in High-Dimensional Spaces.
- Multiple Sequence Alignment and Clustering with Dot Matrices, Entropy, and Genetic Algorithms.
Topics 6 and 7
Students will be introduced to the processing of big data. Content will cover relevant sections of the chapters 5 to 10 relating to “Big Data Processing’’ of the prescribed book which includes:
- Approaches for High-Performance Big Data Processing: Applications and Challenges.
- The Art of Scheduling for Big Data Science.
- Time–Space Scheduling in the MapReduce Framework.
- GEMS: Graph Database Engine for Multithreaded Systems.
- KSC-net: Community Detection for Big Data Networks.
- Making Big Data Transparent to the Software Developers’ Community
Topics 8 and 9
Students will be introduced to the stream techniques of big data and privacy issues. Content will cover relevant sections of the chapters 11 to16 relating to “Big Data Stream Techniques and Algorithms’ and “big data privacy”’ of the prescribed book which includes:
- Key Technologies for Big Data Stream Computing.
- Streaming Algorithms for Big Data Processing on Multicore Architecture.
- Organic Streams: A Unified Framework for Personal Big Data Integration and Organization towards Social Sharing and Individualised Sustainable Use.
- Personal Data Protection Aspects of Big Data.
- Privacy-Preserving Big Data Managemen
Topics 10 and 11
Students will be introduced to the big data applications. Content will cover relevant sections of the chapters 17 to 21 relating to “Big Data Stream Techniques and Algorithms’’ of the prescribed book which includes:
- Big Data in Finance.
- Semantic-Based Heterogeneous Multimedia Big Data Retrieval.
- Topic Modelling for Large-Scale Multimedia Analysis and Retrieval.
- Big Data Biometrics Processing: A Case Study of an Iris Matching Algorithm on Intel Xeon Phi.
- Storing, Managing, and Analysing Big Satellite Data: Experiences and Lessons Learned from Real-World Application.
Topic 12
An opportunity will be provided for a review of all work and to clarify outstanding issues. Instructors/facilitators may choose to focus on a specific area(s) of the unit.
Software/Hardware Used
Software
- Software: N/A
- Version: N/A
- Instructions: N/A
- Additional resources or files: N/A
Hardware
- N/A