Collating and Glancing over Bar Association Of Queensland Data

So it’s been a couple of months, and I have a few fairly large (related) projects in the pipeline. As such, the website has been quiet – and I haven’t been working off my existing content. So I’m going to do a local-scale article on an analysis of the QLD Bar Association data, and I’m going to do it in a single afternoon so I can focus back on my major projects. As such, it will be a fairly cursory examination.

Every year the QLD Bar Association Publishes an Annual Report. And every year not enough people read this report. Well I do, and I love data, so I took it upon myself to collate the data from many years reports into one large spreadsheet. I’m writing this article blind (without looking at the results first), so it’s being written as the data comes out and not retroactively edited (except to fix punctuation etc) – so lets see what it says together.

At a glance

We have location breakdowns, gender breakdowns, senior/junior breakdowns [10], lawyer totals on page 22/23 in the 2022 report [P10-11]. Page 14 has silk appointments and 15 has judicial appointments.

Lets jump back a decade at a time and see if all that datas still there.

We have female members on p17 in a less comparative format and total members with a crappier mapping on [p10]. P16 has silk appointments, Full Appointments P30.

I could not find a comparative senior counsel report.

Well that’s useable if a little inconsistent. The newer data is more complete – given the digital age and all, but it gives us a baseline for data entry. We’ll have to decide when the changeover happens, and exactly what to do with the grouped v ungrouped data, but overall, its workable. What we’ll do is create an excel with multiple columns, and just NULL the overlaps, then join them together. For now we’ll ignore senior/junior breakdowns, as these don’t persist through the dataset.

In layman’s terms, I’m just creating an excel column for each entry type, and if somethings missing from a year (e.g. say 2012 has REGIONAL, but 2022 only has specific locations), I leave it as an X. This results in twice as many entries, but they can be combined later. That way, the data is preserved.

Lets do up an excel from the data and flag any issues inside it.

Set1_no_cleaning Download

Basic Excel Analysis

I left my working notes inside the excel.
A few notes on the preliminary data

Issue 1 – Data Consistency – Barrister Location

Most significantly, not all years use the same categories for private bar location. Similarly, the numbers in the annual reports do NOT equal the annual membership tallies. In laymans terms, one piece of data is tracked on the time of the report each year (allegedly), one is tracked on an annual year. It is impossible to tell exactly what methods were used to ensure data consistency. The key problems are as follows

There will be a jump from annual to non-annual
1. The reports are compiled at different times within the year
1. The numbers don’t add up because of the sliight measurement inconsistency
1. The female numbers are only measured for the report. The exact date of measurement is unspecified, and the report dates vary

As such we have a few choices.

We can use a single ‘block’ from the chart in the 2020-2021 report. This will give ANNUAL data for the prior 10 years, but will not exactly match the time of the report
1. This has some difficulty as we are attaching the gender data, at the semi-random times to the consistently timed report data. It essentially means we are using the 2020 report for all the consistent data, and the individual reports for the inconsistent data. Still, in the authors opinion of the options, this is the least damaging
We can use the data from each report at time of writing
1. The issue there is there is no real integrity in when the data was taken. Additionally, as well as the reports being on different dates, there’s no way to tell the exact date the measurement was taken – presumably prior to the reports creation.

For a more thorough statistical analysis, 2 is probably superior (in fact, one would take 1 and 2, and use it to create a larger dataset. However for this analysis, the purpose is to be substantially accurate and ‘readable’ – and the method of 1 is much simpler in interpretation – if slightly less accurate.

Solution – Change Nothing, accept a margin of error on this point due to inconsistency in data entry

Issue 2 – Data Consistency Judges

There are 2 subissues within the judges. Firstly lay in the retitling/promotion of judges. The shift in judicial distribution changes for example with a new justice, but an additional honorific won’t alter existing categories, but will result in the judge being counted twice. The issue is with the amount of honorifics, and judges being appointed from different places, on this dataset it is difficult to wholly separate them. As such, an honorific is considered an appointment, except when the judge has (on this data) already been appointed at that level, in which case, the latter appointment will be deleted.

The second issue is more nuanced. The claiming of judges to QLD is not always fair. For example, Justice Edelman is listed in the QLD Bar’s list of ‘acknowledge appointments’, despite his connection being overwhelmingly stronger to WA. There is no easy solution, and beyond going through cases individually (which goes beyond this dataset), it is just a source of error that has to be acknowledged.

Solution – code

An honorific is considered an appointment, except when the judge has (on this data) already been appointed at that level, in which case, the latter appointment will be deleted. This will result in a slight overcounting but should deal with the most egregious cases.

I used ChatGPT to generate this lovely square of code

import pandas as pd
from collections import Counter
import openpyxl

df = pd.read_excel(“INSERT LOCATION HERE -> I HAVE REMOVED THIS FOR OBVIOUS REASONS”)

def find_duplicate_names(column_name, df=df):
    # Split the names in the specified column and flatten the list
    all_names = [name.strip() for sublist in df[column_name].dropna().apply(lambda x: x.split(‘,’)).tolist() for name in
                 sublist]

    # Count the occurrences of each name
    name_counts = Counter(all_names)

    # Filter names that occur more than once to get duplicates
    duplicates = [name for name, count in name_counts.items() if count > 1]
    print(column_name)
    print(duplicates)

searchList = [“Appointment_CA”, “Judicial_Appoitment_SC”, “Appointment_MC”, “Appointment_Federal_Family_Kids_Fedcircuit”, “Appointment_District”]

for entry in searchList:
    find_duplicate_names(entry) OUTPUT Appointment_CA [] Judicial_Appoitment_SC [‘[Peter Davis’] Appointment_MC [] Appointment_Federal_Family_Kids_Fedcircuit [] Appointment_District []

So we have 1 duplicate in Peter Davis. Given I clerked for him this very much feels like biting the hands that feeds me – but on the other hand, I can’t say that he counts as two.

Issue 3 – Turning Names into Data (quick solution)

We’re going to turn our judicial names into male female and total counts. We can also add a new column for general totals. A 5 minute google search gave all the ambiguous names genders.

Issue 4 – Combining Categories.

Due to different counting systems being used – the categorization of practitioners by location is split into two subsets. For example, Pre 2021 there was a category for ‘regional, post that is split into a series of sections, (e.g. FNQ and central Queensland). Resultantly, we’re going to have to fuse some data.

The two systems in the data we took are as follows

At a closer look – these are actually consequentially different – as the former separates the private and employed bar – and the latter doesn’t. So our original plan to fuse these has failed catastrophically. Oh well, easy come easy go.

Solution

There is no clean solution. The data is substantially different, and ultimately incompatible. We could get broad trends with totals, but overall they do not overlap as one is geographically distributed and one is geographically distributed and distributed by role – adding a dimension of complexity. Again as this is meant to be cursory – we will not attempt to ‘remove’ the extra dimension.

Nevertheless, with 8 years of data on the former set, we can do some analysis on that, accepting the newer data is different (note different does not mean bad – I personally think it’s very useful, but it does mean in this case probable incompatibility)

Issue 5 – percentages and totals

We’re going to convert our percentages into round numbers so our columns are all the same. As we have totals, this is a relatively trivial process

Plots and Patterns (and a new excel)

Set1_cycle_1_clean Download

Lets start by plotting the basic sexist stuff – because that is the easiest statistic to look at on this data. Lets look at female representation, Silk Appointment percentage, and gender of appointments. We can also look at total appointments compared to the number of lawyers. Also lets look at regional/city distributions

We can just use a lineplot to do the basics, and this should be illustrative. We can then drill down on anything interesting.

As this is not a ‘equality’ piece per-se, and just some guy randomly looking at data – this isn’t the article to discuss targets or the like (and if I did it would blow out in length as there are a lot of complicated historical figures that color the statistics). As such, I will focus on the data, not the implications behind it or associated information that could be used in conjunction with it.

Lets start by looking at membership and silk appointments by gender.

Silks and Barristers

This final plot is not intuitive. It is the percentage of members of each gender promoted to silk each year – or a combination of the above two tables.

Basic Analysis

So we have a ‘very’ slow improvement in the total number of females, but not as much as perhaps would be expected. The bar is still overwhelmingly comprised of males.

The silk appointment rate at 20% female is fairly complicated, as there is a wider picture of under-representation at the bar, and since the number of female practitioners has dramatically improved over the last 25 years, the converse is there are relatively far fewer female practitioners with 25+ years experience. I leave the number open to interpretation, as I could see it being used on both sides of an argument.

Next we can look at the individual courts. I always hesitate to aggregate courts, as arguments can be made how to aggregate them – do we separate district as inferior or superior (I realize ‘tehcnically’ its inferior but in practical terms many consider it closer to the Supreme court than the Magistrates court).

Analysis

So is there anything we can glean from this? The most noticeable trend is the push for female magistrates and to a lesser extent district court judges from about 2018 – the percentages skyrocket in both graphs (and prior to that the figures were much worse). Otherwise there’s nothing too dramatic observable, the court of appeal looks flashy but has such a small sample size of cases it’s difficult to give much credence to it.

Location Distribution – Ending 2020

So sadly, we can’t use 2 years of data on this, because of the entry differences. Specifically, since one entry is two-dimensional (location/role) and the other is one-dimensional (location), without using a good bit of math and extrapolation, we can’t put them together. Nevertheless we can still make a big chart from the 8 year period and examine them

A small but distinct increase in Brisbane work compared to the other sectors is noticeable, but largely the distributions have remained the same

And that’s it. This was meant to be a cursory collation and examination of 10 years of QLD Bar Annual Report Data, and creating an excel that allowed for deeper analysis if one so desired. The excels are composed entirely from the publicly available Bar Association Annual reports, with the numbers entered as described.

Thanks for reading.