INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     випад
    -0.07
    ouch
    -0.07
     karıştır
    -0.07
    ODO
    -0.07
     Undert
    -0.07
    Comput
    -0.07
    ourt
    -0.06
    ουργ
    -0.06
    -0.06
    itur
    -0.06
    POSITIVE LOGITS
     Ben
    0.19
    Ben
    0.17
     ben
    0.14
     Benjamin
    0.13
     BEN
    0.12
    ben
    0.11
     Chen
    0.08
     Sam
    0.08
    iben
    0.08
     Benn
    0.08
    Act Density 0.008%

    No Known Activations