INDEX
    Explanations

    explanations or descriptions

    New Auto-Interp
    Negative Logits
     maksi
    -0.82
     seksi
    -0.80
     silikon
    -0.80
     kado
    -0.79
     keramik
    -0.76
     akut
    -0.76
     kafe
    -0.74
     lele
    -0.73
     krim
    -0.73
     tomat
    -0.72
    POSITIVE LOGITS
     explain
    1.17
     explanations
    1.05
     explanation
    1.04
     explaining
    1.02
    Explain
    1.01
     Explain
    1.01
     explains
    1.00
    explain
    0.98
     explained
    0.97
     why
    0.83
    Act Density 0.109%

    No Known Activations