INDEX
    Explanations

    phrases that express varying degrees of uncertainty or slightness

    New Auto-Interp
    Negative Logits
    opup
    -0.16
     little
    -0.16
     bardzo
    -0.15
    illos
    -0.15
     vraiment
    -0.15
     seemingly
    -0.14
     entirely
    -0.14
     totally
    -0.14
     very
    -0.14
     absolutely
    -0.14
    POSITIVE LOGITS
    /stdc
    0.19
    .ly
    0.19
    ingly
    0.19
     æħ
    0.18
     different
    0.18
     TOO
    0.17
     like
    0.17
    586
    0.17
    umen
    0.16
    different
    0.16
    Act Density 0.050%

    No Known Activations