INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    yk
    -0.19
    inspection
    -0.18
    noinspection
    -0.17
    rup
    -0.17
    ires
    -0.17
    yor
    -0.16
    mtree
    -0.16
    iê
    -0.15
    elly
    -0.15
    amins
    -0.15
    POSITIVE LOGITS
    bed
    0.35
    loid
    0.34
    bing
    0.30
    ular
    0.28
    ulation
    0.26
    lero
    0.26
    by
    0.25
    ulated
    0.25
    ulate
    0.25
    ulations
    0.24
    Act Density 0.009%

    No Known Activations