INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     respaldo
    -0.08
    ષ્ણ
    -0.08
    ખ્ય
    -0.08
     voordelen
    -0.07
    રી
    -0.07
    leti
    -0.07
    sell
    -0.07
    vox
    -0.07
    ર્ણ
    -0.07
     knives
    -0.07
    POSITIVE LOGITS
     freshman
    0.10
     freshmen
    0.10
     iniz
    0.09
    -mid
    0.09
     BASIS
    0.08
     থেকেই
    0.08
    0.08
    0.08
    _INIT
    0.07
    参加
    0.07
    Act Density 0.007%

    No Known Activations