代做CITS 2401编程、代写R编程
Computer Analysis
and Visualisation
Assignment 1
Tweet Analysis
Worth: 5% of the unit
Submission: Answer the questions on the quiz server.
Deadline: 11 March 2021 5pm
Late submissions: late submissions attract 5% raw penalty per day up to 7 days (i.e., 18 March 2021 5pm). After, the mark
will be 0 (zero). Also, any plagiarised work will be marked zero.
1. Outline
Natural language processing (NLP) is useful yet a difficult task. Our UWA Cybersecurity Research Group
has been focusing on rumour detection and generation in order to prevent rumours causing harm to the
society. As a first step, We built an Automated Rumour Generation Hub (ARGH) that uses various machine
learning (ML) and NLP techniques to generate rumours that are difficult to be identified by both humans
and machines. In particular, Twitter has been used as the source dataset as we often observe different
rumours circulating this social media platform. However, they don't provide analytical functions for us to
summarise the data, so we have to do that ourselves.
In this assignment, you will be carrying out simple data analysis tasks using tweets as outlined is the
Tasks section below, mostly just to test your basic Excel competency. More complex tasks will be carried
out in other assignments (stay tuned!).
Note1: This is an individual assignment, please don't share your solution/code/files with others (only
high-level discussion is allowed, e.g., the syntax of the formula, use of array formula with other examples
etc.). If it is found to be not your Original work, then you may be penalised.
Note2: You may use intelligent formatting and colour combinations to display your worksheet in an
understandable manner. However, don't "pimp" up the worksheet.
Note3: You can find ARGH here: https://github.com/argh-rumor-detection/ARGH-Rumor-Generation,
where you can run ARGH yourself using Google Colab.
CITS 2401
Computer Analysis
and Visualisation
2. Tasks
Task 1
Import the original.txt into excel word by word. Here the term "word" refers to any sequence of letters
separated by a space. Note, the text qualifier should be set to {none} when you import the text. This
sheet should be named words_data. Finally, the whole data range should be named words. Figure 1
shows the example output of what it would look like if this task is done correctly.
Figure 1. words_data sheet snippet.
Task 2
Create a new sheet named uniques_data. Import the list of unique words from the uniques.txt file
provided. The words should be located from Cell A1. The whole range should be named uniques.
Task 3
1. In Column B: Calculate the frequency of the unique words from the words_data sheet. You must
use an array formula to do this. Name the cell range as freq.
2. In Column C: Calculate the number of letters used for each word from the words_data. This can
be calculated by simply multiplying the number of letters by its frequency count. Name the cell
range as letters.
3. In Column D: Calculate the rank based on the frequency values. You must use an array formula
to do this. Name the cell range as rank.
In addition, apply conditional formatting on rank where the bottom 10 ranked values (i.e., the 10
smallest values) are formatted with light red filled with dark red text.
Task 4
Create a new sheet named stats. Add the following columns From A2 to A7:
CITS 2401
Computer Analysis
and Visualisation
1. Average
2. Max
3. Min
4. Median
5. Mode
6. SD
Note, SD stands for standard Deviation. Also, Average and SD should be rounded to 2 decimal places.
Next, add labels as follows (Cell: Value):
1. B1: Frequency
2. C1: Letters
3. D1: >Average
4. E1: <Average
The "Frequency" category (Column B) is using freq to calculate the statistics of the data.
The "Letters" category (Column C) is using letters.
The ">Average" category (Column D) uses letters where values are greater than the average (i.e.,
value in cell C2).
The "<Average" category (Column E) uses letters where values are smaller than the average.
Populate all the statistics fields (i.e., B2:C7). You must not use any other supporting cells (i.e., you
should calculate all those stats directly using Excel formulas using previously populated cells only).
Note: the values in the uniques_data tab (i.e., freq and letters) should be treated as the entire
population.
Task 5
Create a new sheet named charts. In this sheet, create a histogram of letters. The bin size should
start from 0 and the gap between the bins are 20. For this, you must use an array formula.
Then, format the chart as follows:
1. The gap width is set to 0%.
2. The series outline is solid black line.
3. The series data labels are set to "Outside End".
4. The title is removed.
Task 6
In "charts" sheet, create a scatter chart using freq in x-axis and letters in y-axis. Then, format the
chart as follows:
1. Remove the title.
2. Add a linear trend line and display the R
2 value.
3. Add x-axis label "Frequency".
4. Add y-axis label "Letters".
The sample image of the charts Sheet is provided in Figure 2, and your solution may look similar to this.
However, the data shown in the image is sample data and is not the correct result (I.e., your figures
may look different). The yellow blocks are added to the image to hide the sample data to avoid confusion.
Task 7
Insert a Treemap of uniques and letters into the "charts" sheet. Remove the title and the legend.
Change the size to 15cm height and 25cm width.
CITS 2401
Computer Analysis
and Visualisation
Figure 2. Sample image of the charts sheet (treemap not shown)
3. Submission
You should answer the questions related to the tasks above on the quiz server by the due date - 11 March
2021 5pm (drop dead due date 18 March 2021 with 5% raw penalty per day).
Submit your Excel workbook on the quiz server, you should name the file as A1_[student id].xlsx.
For example, if your student ID is 12345678, then your file name is A1_12345678.xlsx.
Fail to follow this will result in a penalty of 50%.
CITS 2401
Computer Analysis
and Visualisation
4. Rubrics
Criteria Highly Satisfactory (D, HD) Satisfactory (P, CR) Unsatisfactory (N)
Excel functions
(10 marks)
Understand various
Excel functions.
Demonstrate the
ability to carry out
various Excel
functionalities and
tools.
Demonstrated the ability to use
Excel functions fluently:
Correct use of Excel functions
as appropriate.
Demonstrated the Ability
to use Excel functions:
Some correct uses of
Excel functions.
Failed to demonstrate
the ability to use Excel
functions:
Incorrect use of
Excel functions.
Excel formulas
(20 marks)
Understand the use of
Excel formulas.
Demonstrated the ability to
utilise Excel formulas fluently:
Correct use of Excel formulas
most adequate for the
problem.
Comprehensive understanding
of Excel formulas and their
usage.
Demonstrated the ability
to utilise Excel formulas:
Correct use of Excel
formulas for the
problem.
Understanding of
Excel formulas and
their usage.
Failed to demonstrate
the ability to Utilise
Excel formulas:
Incorrect use of
Excel formulas.
Misunderstanding
of Excel formulas
and their usage..
Excel visualisation
(20 marks)
Understand the use of
Excel visualisation
tools.
Demonstrated the ability to
visualise using Excel:
The visualisation generated is
accurate and comprehensive.
Demonstrated the ability
to visualise using Excel:
The visualisation
generated is accurate.
Demonstrated the
ability to visualise using
Excel:
The visualisation
generated is not
accurate.
This assignment is worth a total of 50 marks.
如有需要,请加QQ:99515681 或WX:codehelp
|