INDEX
    Explanations

    phrases indicating legal judgments or findings of guilt

    New Auto-Interp
    Negative Logits
    avra
    -0.19
    ebek
    -0.18
    ADOS
    -0.16
    uele
    -0.16
    ARIABLE
    -0.15
     oblig
    -0.15
    ãģŀ
    -0.15
    롱
    -0.15
    laÄį
    -0.15
    ivid
    -0.14
    POSITIVE LOGITS
     fit
    0.35
     Fit
    0.28
     guilty
    0.28
    Fit
    0.27
    -fit
    0.26
    fit
    0.26
     fitness
    0.23
     unfit
    0.23
     worthy
    0.21
    .fit
    0.20
    Act Density 0.091%

    No Known Activations