INDEX
    Explanations

    punishing someone or not

    New Auto-Interp
    Negative Logits
     wit
    0.49
     ifs
    0.49
     Leslie
    0.47
     ach
    0.46
    0.45
     Anglic
    0.44
     יו
    0.44
    0.44
    เด็ก
    0.43
     ANGE
    0.43
    POSITIVE LOGITS
     continuando
    0.49
     comporta
    0.49
    galaxys
    0.47
    toluene
    0.44
    proton
    0.44
     diventare
    0.43
    izzando
    0.43
    િલ
    0.42
    ezing
    0.42
    pais
    0.42
    Act Density 0.000%

    No Known Activations