INDEX
    Explanations

    words expressing absolute states or conditions

    New Auto-Interp
    Negative Logits
    ree
    -0.17
    867
    -0.17
    olt
    -0.15
    esis
    -0.15
    ette
    -0.14
    roe
    -0.14
    vil
    -0.14
     Prim
    -0.14
    Ñģим
    -0.14
     prim
    -0.14
    POSITIVE LOGITS
     completely
    0.18
     entirely
    0.16
    ayah
    0.15
    ajan
    0.14
    addon
    0.14
    å½»
    0.14
    enario
    0.14
    ÑĨÑı
    0.14
    enou
    0.13
     Hra
    0.13
    Act Density 0.044%

    No Known Activations