INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Stuart
    -0.17
     McKay
    -0.16
     Roths
    -0.15
    zÅij
    -0.14
     Byron
    -0.14
     decre
    -0.14
    éϵ
    -0.14
     Twe
    -0.14
    gaard
    -0.14
    ROY
    -0.13
    POSITIVE LOGITS
     Thomas
    0.24
     Humph
    0.20
     Hum
    0.20
     Roger
    0.19
     Barth
    0.19
     Ralph
    0.19
     Nicholas
    0.18
    Thomas
    0.18
     John
    0.18
     Sym
    0.17
    Act Density 0.015%

    No Known Activations