INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     estable
    -0.08
     panc
    -0.07
     सी
    -0.07
     alert
    -0.07
     reserv
    -0.07
     punches
    -0.07
     män
    -0.07
    ahid
    -0.07
     غوښت
    -0.07
    POSITIVE LOGITS
    attaque
    0.08
    gewicht
    0.08
    きを
    0.08
     horário
    0.08
     మాట్ల
    0.07
    wissen
    0.07
    正规
    0.07
    več
    0.07
    Directional
    0.07
     Else
    0.07
    Act Density 0.005%

    No Known Activations