Analysis with Julia

After you have Julia installed (see Julia Quick Start), you might want to analyze some data using Julia. Here’s a collection of examples that can help you do that. They are sorted based on a common workflow:

  1. Import data
  2. Plot data Many 2D and 3D plot examples have been collected in Beautiful Makie, and it’s useful to browse through this collection to understand ways to plot your data.
  3. Fit data
  4. Determine significance

Importing data

Here are some strategies to import and save data.

Importing data from a csv file

Use the CSV.jl and DataFrames.jl packages:

using CSV, DataFrames

n = 10 #Let's generate 10 random numbers
x = sort((rand( -10:100, n)))
y = 5/9 .* (x  .- 32) #A conversion from Fahrenheit to Celsius

data = DataFrame(F=x,C=y) #Turn it into a neat little data frame

CSV.write("data.csv", data) #write the data to a CSV
data_new = CSV.read("data.csv", DataFrame) #read the data from a CSV

x_values = data_new.F # Access columns by name
y_values = data_new.C

Importing a tiff image

Use TiffImages.jl for TIFF loading:

using TiffImages

img = TiffImages.load("image.tif")
height, width = size(img)
#Julia uses complicated types for images. I often convert to a simpler format.
img_float = Int16.(img) 

Importing data from a multidimensional tiff image

For stacks, time series, and multi-channel images:

using TiffImages

img_stack = TiffImages.load("stack.tif")

n_frames = size(img_stack, 3) # For 3D data (height × width × frames)

CHANNELS = 4 #As an example
channel_1 = img_stack[:, :, 1:CHANNELS:end]
channel_2 = img_stack[:, :, 2:CHANNELS:end]
channel_3 = img_stack[:, :, 3:CHANNELS:end]
channel_4 = img_stack[:, :, 4:CHANNELS:end]

Saving your Julia workspace with JLD2.jl

Save and load Julia variables and workspaces as .jdl2 files, like you can in Matlab using .mat files.

using JLD2

@save "workspace.jld2" # Save everything

# Save specific variables
@save "workspace.jld2" time_points measurements all_data

# Load variables back
@load "workspace.jld2" time_points measurements all_data

# Or load into a dictionary
data = load("results.jld2")
loaded_time = data["experiment_time"]

Importing data from an xml file

Use EzXML.jl for XML parsing. This is more difficult and probably requires browsing the XML file for a bit to understand it’s structure.

using EzXML

doc = parsexml(read("data.xml", String))

root = doc.root # Navigate to specific elements
for element in eachelement(root)
    value = element["attribute_name"]     # Extract attributes
    text = nodecontent(element)     # Extract text content
end

Fitting data

Use LsqFit.jl for curve fitting:

Exponential decay fit

using GLMakie, LsqFit

t_data = 0:9
y_data = [95.2, 74.1, 58.3, 46.2, 37.1, 30.2, 25.1, 21.5, 18.9, 17.1]

#= Exponential decay: y = A * exp(-t/τ) + baseline
We create a function we call exp_decay of that form
But this could be any function you want to fit to!=#
exp_decay(t, p) = p[1] .* exp.(-t ./ p[2]) .+ p[3]

p_guess = [100.0, 3.0, 10.0] # Initial parameter guesses

fit_result = curve_fit(exp_decay, t_data, y_data, p_guess)
fit_params = coef(fit_result)

#Fit with 100 subsamples
x_fit_range = range(t_data[1],t_data[end], 100)
y_fit = exp_decay(x_fit_range, fit_params)

#Plot
fig = Figure(size = (600, 400))
ax = Axis(fig[1, 1])
scatter!(ax, t_data, y_data, markersize = 10, color=:black)
lines!(ax, x_fit_range, y_fit, linewidth = 3, color=:red)

fig
Precompiling packages...
    520.8 msStaticArraysCore
    577.4 msAdapt
    881.6 msProgressMeter
    888.8 msAxisArrays
    948.1 msFillArrays
   1302.3 msPDMats
    719.4 msPreferences
    832.5 msSimpleTraits
   2104.6 msIrrationalConstants
    645.2 msAdapt → AdaptSparseArraysExt
    543.4 msOffsetArrays → OffsetArraysAdaptExt
   1537.2 msCompat
   1889.3 msStructArrays
    574.5 msFillArrays → FillArraysStatisticsExt
    838.1 msFillArrays → FillArraysSparseArraysExt
    645.7 msPrecompileTools
    813.9 msFillArrays → FillArraysPDMatsExt
    894.9 msJLLWrappers
   4594.5 msBaseDirs
   1226.6 msStructArrays → StructArraysAdaptExt
   1298.9 msLogExpFunctions
   1955.2 msComputePipeline
   1289.6 msCompat → CompatLinearAlgebraExt
   3357.9 msFileIO
   1686.4 msStructArrays → StructArraysSparseArraysExt
    532.2 msStructArrays → StructArraysLinearAlgebraExt
    689.5 msGraphite2_jll
    640.2 msLibmount_jll
    636.7 msEpollShim_jll
    694.3 msLLVMOpenMP_jll
    692.5 msBzip2_jll
    704.4 msRmath_jll
    649.2 msXorg_libXau_jll
    745.0 mslibpng_jll
    766.4 mslibfdk_aac_jll
   3492.5 msColorSchemes
    701.6 msImath_jll
    673.9 msGiflib_jll
    699.2 msLAME_jll
   1175.1 msIntelOpenMP_jll
    690.4 msLERC_jll
    692.6 msEarCut_jll
    680.2 msCRlibm_jll
    762.4 msJpegTurbo_jll
    743.3 msXZ_jll
    681.3 msOgg_jll
    754.2 msoneTBB_jll
    610.8 msXorg_libXdmcp_jll
    846.8 msx265_jll
   7453.1 msStaticArrays
   1160.0 msx264_jll
   1083.5 mslibaom_jll
    914.2 msZstd_jll
   1083.9 msExpat_jll
    972.0 msOpus_jll
    980.7 msLZO_jll
    936.8 msXorg_xtrans_jll
   1034.6 msLibiconv_jll
    998.3 msLibffi_jll
    980.2 msisoband_jll
   9963.1 msSIMD
   1224.4 msFFTW_jll
  10115.2 msParsers
    663.9 msLibuuid_jll
    957.5 msOpenSpecFun_jll
    780.6 msFriBidi_jll
    810.4 msOpenBLASConsistentFPCSR_jll
    568.9 msLogExpFunctions → LogExpFunctionsInverseFunctionsExt
    723.0 msPixman_jll
    790.6 msFreeType2_jll
    909.4 msRmath
   1580.8 msQOI
    866.5 msCRlibm
   1109.3 msOpenEXR_jll
   2277.1 msDataStructures
   1015.6 mslibsixel_jll
   2604.6 msChainRulesCore
  17591.4 msUnitful
    765.4 msXorg_libxcb_jll
   1003.3 mslibvorbis_jll
    759.6 msStaticArrays → StaticArraysStatisticsExt
    750.9 msConstructionBase → ConstructionBaseStaticArraysExt
    685.7 msAdapt → AdaptStaticArraysExt
   1603.5 msMKL_jll
    616.3 msDbus_jll
   1016.0 msLibtiff_jll
    725.4 msWayland_jll
    977.4 msGettextRuntime_jll
    680.2 msIsoband
   1792.1 msStructArrays → StructArraysStaticArraysExt
  16013.4 msImageCore
   1825.7 msFreeType
   1697.8 msFontconfig_jll
   1991.2 msJSON
    768.5 msSortingAlgorithms
   3086.7 msSpecialFunctions
   1902.8 msOpenEXR
   2101.1 msQuadGK
    554.1 msAbstractFFTs → AbstractFFTsChainRulesCoreExt
   1643.9 msChainRulesCore → ChainRulesCoreSparseArraysExt
   3980.1 msIntervalArithmetic
    874.7 msUnitful → ConstructionBaseUnitfulExt
   1647.4 msLogExpFunctions → LogExpFunctionsChainRulesCoreExt
    845.9 msUnitful → InverseFunctionsUnitfulExt
   1839.5 msStaticArrays → StaticArraysChainRulesCoreExt
    906.3 msXorg_libX11_jll
    927.6 msUnitful → PrintfExt
  10379.0 msPlotUtils
   1192.4 msGlib_jll
   1822.6 msImageBase
    930.0 msColorBrewer
   2818.7 msJpegTurbo
   2405.4 msSixel
   4437.9 msPNGFiles
   1319.6 msHypergeometricFunctions
  11699.5 msAutoma
   2165.2 msSpecialFunctions → SpecialFunctionsChainRulesCoreExt
  13229.0 msGeometryBasics
   1061.5 msIntervalArithmetic → IntervalArithmeticSparseArraysExt
   1085.1 msColorVectorSpace → SpecialFunctionsExt
    748.8 msIntervalArithmetic → IntervalArithmeticIntervalSetsExt
   1012.4 msIntervalArithmetic → IntervalArithmeticLinearAlgebraExt
    758.4 msXorg_libXext_jll
    780.7 msXorg_libXfixes_jll
    793.0 msXorg_libXrender_jll
   3970.9 msStatsBase
   7472.5 msFFTW
    761.7 msXorg_libxkbfile_jll
    900.0 msPacking
   3215.4 msInterpolations
   1908.4 msImageAxes
   1885.2 msShaderAbstractions
    812.1 msXorg_libXinerama_jll
    741.5 msLibglvnd_jll
   3098.4 msStatsFuns
   2872.5 msMeshIO
    751.7 msXorg_libXi_jll
    759.3 msXorg_libXrandr_jll
   2604.7 msFreeTypeAbstraction
    740.2 msXorg_libXcursor_jll
    714.1 msXorg_xkbcomp_jll
   1624.3 msCairo_jll
    723.5 msStatsFuns → StatsFunsInverseFunctionsExt
   1171.8 msInterpolations → InterpolationsUnitfulExt
   1347.1 msImageMetadata
   1358.8 mslibwebp_jll
   4003.4 msExactPredicates
   2216.3 msStatsFuns → StatsFunsChainRulesCoreExt
   1800.0 msXorg_xkeyboard_config_jll
   6934.0 msGridLayoutBase
   1618.9 msHarfBuzz_jll
    636.5 msxkbcommon_jll
   1097.6 mslibass_jll
   2562.4 msNetpbm
    906.2 msPango_jll
   2110.7 msWebP
    581.2 mslibdecor_jll
    634.0 msGLFW_jll
   1866.7 msFFMPEG_jll
   5452.8 msMathTeXEngine
   5658.4 msDistributions
   1119.6 msGLFW
   4198.8 msDelaunayTriangulation
   1138.3 msDistributions → DistributionsTestExt
   1792.5 msDistributions → DistributionsChainRulesCoreExt
   1469.0 msKernelDensity
  30170.9 msTiffImages
    911.4 msImageIO
 116142.0 msMakie
  50795.5 msGLMakie
  170 dependencies successfully precompiled in 226 seconds. 115 already precompiled.
Precompiling packages...
   1684.4 msQuartoNotebookWorkerLaTeXStringsExt (serial)
  1 dependency successfully precompiled in 2 seconds
Precompiling packages...
   1136.4 msQuartoNotebookWorkerJSONExt (serial)
  1 dependency successfully precompiled in 1 seconds
Precompiling packages...
   1047.7 msQuartoNotebookWorkerTablesExt (serial)
  1 dependency successfully precompiled in 1 seconds
Precompiling packages...
   5415.2 msQuartoNotebookWorkerMakieExt (serial)
  1 dependency successfully precompiled in 5 seconds
Precompiling packages...
    478.9 msDiffResults
    563.0 msCommonSubexpressions
    650.3 msADTypes
    662.9 msDiffRules
   1039.6 msSetfield
    466.6 msADTypes → ADTypesConstructionBaseExt
   1328.9 msArrayInterface
    444.5 msArrayInterface → ArrayInterfaceStaticArraysCoreExt
    539.7 msArrayInterface → ArrayInterfaceSparseArraysExt
   1536.7 msDifferentiationInterface
    577.7 msFiniteDiff
    572.8 msDifferentiationInterface → DifferentiationInterfaceSparseArraysExt
    514.5 msDifferentiationInterface → DifferentiationInterfaceFiniteDiffExt
    569.9 msFiniteDiff → FiniteDiffSparseArraysExt
   3521.1 msForwardDiff
   1418.5 msDifferentiationInterface → DifferentiationInterfaceForwardDiffExt
   1346.5 msNLSolversBase
   2650.5 msLsqFit
  18 dependencies successfully precompiled in 10 seconds. 58 already precompiled.
Precompiling packages...
    486.2 msIntervalArithmetic → IntervalArithmeticDiffRulesExt
  1 dependency successfully precompiled in 1 seconds. 43 already precompiled.
Precompiling packages...
    625.6 msUnitful → ForwardDiffExt
  1 dependency successfully precompiled in 1 seconds. 22 already precompiled.
Precompiling packages...
   1271.9 msIntervalArithmetic → IntervalArithmeticForwardDiffExt
  1 dependency successfully precompiled in 2 seconds. 48 already precompiled.
Precompiling packages...
   1048.2 msForwardDiff → ForwardDiffStaticArraysExt
  1 dependency successfully precompiled in 1 seconds. 22 already precompiled.
Precompiling packages...
    437.8 msADTypes → ADTypesChainRulesCoreExt
  1 dependency successfully precompiled in 1 seconds. 9 already precompiled.
Precompiling packages...
    441.3 msDifferentiationInterface → DifferentiationInterfaceChainRulesCoreExt
  1 dependency successfully precompiled in 1 seconds. 11 already precompiled.
Precompiling packages...
    534.4 msDifferentiationInterface → DifferentiationInterfaceStaticArraysExt
  1 dependency successfully precompiled in 1 seconds. 10 already precompiled.
Precompiling packages...
    431.3 msArrayInterface → ArrayInterfaceChainRulesCoreExt
  1 dependency successfully precompiled in 1 seconds. 11 already precompiled.
Precompiling packages...
    527.4 msFiniteDiff → FiniteDiffStaticArraysExt
  1 dependency successfully precompiled in 1 seconds. 21 already precompiled.

Determining Significance

Significance is a genuinely helpful tool to help you think about your data critically. The greater scientific community demands the simplicity of presented p-values often in the form of dazzling and deceitful stars ***.

Here I focus on both the bread and butter significance tests, and will add very specific cases where I find significance calculations a little more sticky.

Determining significance between two sets of numbers

Use HypothesisTests.jl for statistical tests. Here we show a t-test unpaired data

using GLMakie, HypothesisTests

#Two independent groups
group1 = [120, 125, 130, 128, 135]
group2 = [100, 105, 110, 108, 115] #AKA Group W

#Perform unpaired t-test (Welch's t-test)
p_val = pvalue(UnequalVarianceTTest(group1, group2))

#a little funtion to get those dazzling stars
stars(p) = p < 0.001 ? "***" : p < 0.01 ? "**" : p < 0.05 ? "*" : "ns"

#Plot
fig = Figure(size = (600, 600))
ax = Axis(fig[1, 1], xticks = ([1, 2], ["Group 1", "Group W"]), ylabel = "Measurement")

#This thing (x -> 1)., is a micro-function that takes everything and sets it to 1 applied dot-wise
boxplot!(ax, (x -> 1).(group1) , group1, color = :lightblue, width = 0.4)
boxplot!(ax, (x -> 2).(group1) , group2, color = :lightcoral, width = 0.4)
scatter!(ax, (x -> 1).(group2) , group1, color = :blue)
scatter!(ax, (x -> 2).(group2) , group2, color = :red)

#Significance Bracket
y_max = max(maximum(group1), maximum(group1)) * 1.1
bracket_points = [(1, y_max - 2), (1, y_max), (2, y_max), (2, y_max - 2)]
lines!(ax, bracket_points, color = :black)
text!(ax, 1.5, y_max, text=stars(p_val), align=(:center, :bottom),fontsize=20)

#Adjust y-axis limits to ensure everything is visible
ylims!(ax, nothing, y_max * 1.05)

fig
Precompiling packages...
   2891.1 msRoots
   1731.0 msHypothesisTests
  2 dependencies successfully precompiled in 5 seconds. 61 already precompiled.
Precompiling packages...
    470.2 msAccessors → StructArraysExt
  1 dependency successfully precompiled in 1 seconds. 20 already precompiled.
Precompiling packages...
   1078.1 msSimpleBufferStream
    800.0 msBitFlags
   1084.4 msDelimitedFiles
   1090.9 msCodecZlib
   1119.6 msMeasures
   1143.9 msURIs
   1678.6 msUnzip
    642.1 msXorg_libICE_jll
    703.6 msAccessors → UnitfulExt
    782.2 msLoggingExtras
    762.9 msExceptionUnwrapping
   1033.1 msConcurrentUtilities
    753.4 msfzf_jll
   2115.1 msCrayons
    691.6 msmtdev_jll
    665.4 mslibevdev_jll
    686.2 mseudev_jll
    845.2 msMbedTLS_jll
    665.6 msXorg_xcb_util_jll
    706.6 msAccessors → StaticArraysExt
    760.2 msVulkan_Loader_jll
    728.7 msFFMPEG
   1510.2 msRecipesBase
    835.7 msXorg_libSM_jll
    808.8 mslibinput_jll
    893.5 msJLFzf
    881.1 msXorg_xcb_util_image_jll
   1907.8 msOpenSSL
    814.4 msXorg_xcb_util_keysyms_jll
    875.2 msXorg_xcb_util_renderutil_jll
   1590.2 msMbedTLS
    796.3 msXorg_xcb_util_wm_jll
   4709.1 msLatexify
    811.6 msXorg_xcb_util_cursor_jll
   3502.8 msPlotThemes
   4165.7 msMarchingCubes
    901.3 msLatexify → SparseArraysExt
   1525.5 msQt6Base_jll
   1693.7 msUnitfulLatexify
   3936.3 msRecipesPipeline
   2311.4 msQt6ShaderTools_jll
   2471.5 msGR_jll
   1913.9 msQt6Declarative_jll
    533.1 msQt6Wayland_jll
  12490.5 msHTTP
   3532.4 msGR
  36014.6 msUnicodePlots
   1493.8 msUnicodePlots → UnitfulExt
  36381.7 msPlots
   2185.1 msPlots → UnitfulExt
  50 dependencies successfully precompiled in 61 seconds. 154 already precompiled.
Precompiling packages...
    552.0 msRoots → RootsForwardDiffExt
  1 dependency successfully precompiled in 1 seconds. 31 already precompiled.
Precompiling packages...
    484.2 msRoots → RootsChainRulesCoreExt
  1 dependency successfully precompiled in 1 seconds. 19 already precompiled.

Determining if a slope is significantly different from 0

When fitting a line, we test if the slope could be statistically the same as zero just by chance. The null hypothesis is that the slope is 0. So we calculate how many standard errors the slope is from zero? Pop-quiz then, in the inverse case: How do you prove the slope is significantly similar to 0?

using GLMakie, GLM

x_data = 1:10
y_data = [2, 2, 3, 3, 3, 3, 4, 4, 4, 5]
X = hcat((x->1).(x_data), x_data) 

fit = lm(X, y_data)
coeftable(fit) #the coeftable of fit has p-values in it's 4 column
p_val = coeftable(fit).cols[4][2]

x_fit = range(x_data[1]-1,x_data[end]+1,100)
X_fit = hcat((x->1).(x_fit), x_fit)

#Predict the fit line and the 95% conf. int.
preds = predict(fit, X_fit, interval = :confidence)

#Plot
fig = Figure(size=(600, 600))
ax = Axis(fig[1, 1], xlabel="x", ylabel="y")

scatter!(ax, x_data, y_data, color = :black)
lines!(ax, x_fit, preds.prediction, color = :blue, linewidth=2)
band!(ax, x_fit, preds.lower, preds.upper, color=(:skyblue, 0.2))

#p-value text
p_text = p_val < 0.0001 ? "p < 0.0001" : "p = $(round(p_val, digits=4))"
text!(ax, 0.05, 0.95, text=p_text, space=:relative, align=(:left, :top), fontsize=16)

fig
Precompiling packages...
    417.9 msShiftedArrays
   1337.8 msStatsModels
   1466.9 msGLM
  3 dependencies successfully precompiled in 3 seconds. 53 already precompiled.

Determining if a slope is significantly 0

Ok so here is the tricky bit. What if it looks like there really is no relation, and we want to see if that is significant? Well… it’s hard. One thing you can do to make it easy is define a small range around zero where you would consider the slope to be functionally zero for your specific problem. Then you can test the hypothesis that it lies inside or outside of that range.


using GLM, Distributions

x_data = 1:20
y_data = [10.1, 9.9, 10.0, 10.2, 9.8, 10.1, 9.9, 10.0, 9.9, 10.1,
          9.8, 10.2, 10.0, 9.9, 10.1, 10.0, 9.8, 10.2, 9.9, 10.0]

#Let any slope between -0.1 and 0.1 be definitionally zero.
delta = 0.1

X = hcat(ones(length(x_data)), x_data)
fit = lm(X, y_data)

slope = coeftable(fit).cols[1][2]   
std_err = coeftable(fit).cols[2][2]
dof = dof_residual(fit)   # Degrees of freedom

#Perform the Two One-Sided T-Tests (requires Distributions.jl) ---
t_dist = TDist(dof) # Create a t-distribution object for our fit

#Is the slope significantly GREATER than the lower bound?
t_lower = (slope - (-delta)) / std_err
p_lower = 1 - cdf(t_dist, t_lower)

# Is the slope significantly LESS than the upper bound?
t_upper = (slope - delta) / std_err
p_upper = cdf(t_dist, t_upper)

#The final TOST p-value is the larger of the two
tost_p_value = max(p_lower, p_upper)

println("Equivalence Test Results (bounds = ±$delta):")
println("Slope: $(round(slope, digits=4)), SE: $(round(std_err, digits=4))")
println("TOST p-value: $(round(tost_p_value, digits=4))")

if tost_p_value < 0.05
    println("Conclusion: The slope is statistically equivalent to zero within delta=$delta.")
else
    println("Conclusion: We cannot conclude the slope is equivalent to zero within delta=$delta.")
end

Determining significance between two slopes

ANCOVA tests whether regression lines from different groups have different slope. This is a complex idea that’s worth spelling out here.

ANOVA tests if the means of two or more groups are statistically different.

The model is: \(Y_{ij} = \mu_i + \epsilon_{ij}\)

  • \(Y_{ij}\) is the measured value for the j-th data point in the i-th group.
  • \(\mu_i\) is the mean of group i.
  • \(\epsilon_{ij}\) is the difference of a data point and the mean.

The null hypothesis (\(H_0\)) is that all group means are equal: \(H_0: \mu_1 = \mu_2 = \dots = \mu_k\)

This is tested statistically using the F-statistic:

\(F = \frac{\text{Variance between groups}}{\text{Variance within groups}}\)

A large F-statistic indicates the variation between groups (the signal) is much larger than the random variation within groups (the noise). A p-value then determines if this F-statistic is large enough to reject the null hypothesis.

ANCOVA extends this by testing if the group means are different after controlling for a continuous variable, the Covariate.

The model becomes: \(Y_{ij} = \mu_i + \beta(X_{ij} - \bar{X}) + \epsilon_{ij}\)

  • \(\mu_i\) represents the adjusted mean of group i.
  • \(\beta\) is the coefficient quantifying the covariate’s effect.
  • \(X_{ij}\) is the covariate score for the j-th person in the i-th group.
  • \(\bar{X}\) is the average of all covariate scores across all groups.

ANCOVA uses the same F-test logic to test two primary hypotheses:

  • Null Hypothesis (\(H_0\)): All adjusted group means are equal. \(H_0: \mu_1 = \mu_2 = \dots = \mu_k\)

  • Null Hypothesis (\(H_0\)): The regression coefficient for the covariate is zero (i.e., there is no linear relationship). \(H_0: \beta = 0\)

In code, this ANCOVA test is:

using GLM, DataFrames, GLMakie, StatsModels

#Define the data for each group separately for clarity
A_X = [1.538, 6.994, 1.019, 6.603, 7.603, 4.914, 8.773, 4.021, 7.186, 6.695]
A_Y = [9.424, 17.639, 6.949, 14.459, 21.308, 14.703, 25.158, 12.910, 23.893, 19.834]

B_X = [3.746, 5.175, 0.720, 9.524, 4.982, 6.769, 7.224, 9.479, 1.495, 1.249]
B_Y = [14.650, 19.031, 9.555, 29.480, 20.433, 23.441, 22.306, 25.264, 6.952, 12.501]

C_X = [0.560, 4.536, 8.414, 5.485, 2.953, 8.107, 2.994, 7.255, 5.984, 3.541]
C_Y = [3.650, 22.332, 41.558, 26.173, 15.339, 38.155, 16.305, 25.017, 27.204, 18.539]

#Combine the separate group data into a single DataFrame for analysis
data = vcat(
    DataFrame(X = A_X, Y = A_Y, Group = "A"),
    DataFrame(X = B_X, Y = B_Y, Group = "B"),
    DataFrame(X = C_X, Y = C_Y, Group = "C")
)

#Little function to get stars
stars(p) = p < 0.001 ? "***" : p < 0.01 ? "**" : p < 0.05 ? "*" : "ns"

# Function to reorder groups to help with multiple pairs
function reorder_groups(df, ref_group)
    ref_rows = filter(row -> row.Group == ref_group, df)
    other_rows = filter(row -> row.Group != ref_group, df)
    return vcat(ref_rows, other_rows)
end

# Fit models with different reference groups
data_A_ref = reorder_groups(data, "A")
ancova_A = lm(@formula(Y ~ 1 + X + Group + X & Group), data_A_ref)
ct_A = coeftable(ancova_A)

data_B_ref = reorder_groups(data, "B")
ancova_B = lm(@formula(Y ~ 1 + X + Group + X & Group), data_B_ref)
ct_B = coeftable(ancova_B)

get_p(model, term) = coeftable(model).cols[4][findfirst(==(term), coeftable(model).rownms)]
p_AB = get_p(ancova_A, "X & Group: B")
p_AC = get_p(ancova_A, "X & Group: C")
p_BC = get_p(ancova_B, "X & Group: C")

fig = Figure(size = (600, 600))
ax = Axis(fig[1, 1])

colors = Dict("A" => :blue, "B" => :orange, "C" => :green)
for group in ["A", "B", "C"]
    subset = filter(row -> row.Group == group, data)
    scatter!(ax, subset.X, subset.Y, color = colors[group])
    
    x_range = range(extrema(subset.X)..., length=100)
    predict_df = DataFrame(X = x_range, Group = group)
    y_pred = predict(ancova_A, predict_df)
    y_pred_float = [y for y in y_pred]

    lines!(ax, x_range, y_pred_float, color = colors[group], linewidth = 2.5)
    text!(ax, x_range[end], y_pred_float[end], text = " $group", align = (:left, :center), color = colors[group])
end

legend_text = """
Slope Significance
A vs B: $(stars(p_AB))
A vs C: $(stars(p_AC))
B vs C: $(stars(p_BC))
"""

# Create a container for the legend
legend_layout = GridLayout(tellwidth = false, tellheight = false, halign = 0.98, valign = 0.05)
fig[1, 1] = legend_layout

Box(legend_layout[1, 1], color = (:white, 0.85), strokecolor = :black, strokewidth = 1)
Label(legend_layout[1, 1], legend_text, padding = (10, 10, 8, 8))

fig
Precompiling packages...
    456.1 msInvertedIndices
    523.4 msPooledArrays
    695.4 msInlineStrings
   1313.1 msStringManipulation
   1497.3 msSentinelArrays
  11195.6 msPrettyTables
  29124.9 msDataFrames
  7 dependencies successfully precompiled in 42 seconds. 28 already precompiled.
Precompiling packages...
    468.3 msInlineStrings → ParsersExt
  1 dependency successfully precompiled in 1 seconds. 9 already precompiled.