INDEX
    Explanations

    possessive forms and contractions

    New Auto-Interp
    Negative Logits
    combe
    -0.16
     aktu
    -0.15
    ugh
    -0.15
    anzi
    -0.15
    inson
    -0.14
    اÙĦات
    -0.14
    inha
    -0.14
    ino
    -0.14
     Colo
    -0.14
    isan
    -0.14
    POSITIVE LOGITS
     safe
    0.39
    safe
    0.31
     Safe
    0.29
     fair
    0.29
    Safe
    0.28
    -safe
    0.27
     hard
    0.24
    fair
    0.23
    _safe
    0.23
     SAFE
    0.21
    Act Density 0.091%

    No Known Activations