INDEX
Negative Logits
uriye
-0.08
?,
-0.07
Dzięki
-0.07
Hep
-0.07
Men
-0.07
Jean
-0.07
Pr
-0.07
că
-0.07
Tek
-0.07
Say
-0.07
POSITIVE LOGITS
_SN
0.08
แบ
0.08
humiliation
0.08
Ys
0.08
_act
0.08
QUIRED
0.08
aine
0.07
LAST
0.07
begged
0.07
admittedly
0.07
Activations Density 0.000%