INDEX
Explanations
negative phrases or words that express disagreement or denial
New Auto-Interp
Negative Logits
olumn
-0.19
isoft
-0.15
eldre
-0.15
Äįe
-0.15
nám
-0.15
undan
-0.14
itore
-0.14
imity
-0.14
ominator
-0.14
ayout
-0.14
POSITIVE LOGITS
ably
0.37
ing
0.34
withstanding
0.34
able
0.33
ed
0.32
ions
0.28
many
0.27
ion
0.27
icing
0.27
only
0.26
Activations Density 0.032%