INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    DA
    -1.36
    da
    -1.34
    Da
    -0.81
     DA
    -0.80
    daw
    -0.65
     Da
    -0.65
     da
    -0.64
    das
    -0.57
    dai
    -0.55
    principalColumn
    -0.53
    POSITIVE LOGITS
     Hira
    0.53
    Географи
    0.52
     behind
    0.51
    ··
    0.51
     Heidelberg
    0.51
     Hurley
    0.50
    ządz
    0.50
    BuilderFactory
    0.49
     Fav
    0.49
    inely
    0.49
    Act Density 0.045%

    No Known Activations