INDEX
Explanations
words related to negation and differential expressions
New Auto-Interp
Negative Logits
pysty
-0.57
persones
-0.55
føl
-0.55
umana
-0.54
nemico
-0.53
ńskich
-0.53
purposes
-0.52
įsi
-0.52
menschliche
-0.51
desselben
-0.51
POSITIVE LOGITS
schon
0.98
noch
0.82
windowFixed
0.71
только
0.70
gerade
0.69
)))));
0.68
nog
0.67
Только
0.66
רק
0.65
тільки
0.63
Activations Density 0.372%