INDEX
    Explanations

    phrases related to interactions with authority figures

    phrases relating to balance and decision-making

    New Auto-Interp
    Negative Logits
     respectively
    -0.83
    }.
    -0.81
    ?).
    -0.73
    `.
    -0.72
    )).
    -0.72
    *.
    -0.71
    .).
    -0.70
    '.
    -0.68
    $.
    -0.68
    +.
    -0.67
    POSITIVE LOGITS
     his
    0.55
     himself
    0.54
     wheelchair
    0.48
     resignation
    0.48
     apologise
    0.47
    cohol
    0.47
     Leeds
    0.46
    anus
    0.46
     virginity
    0.46
     composure
    0.45
    Act Density 1.939%

    No Known Activations