INDEX
Explanations
negations, indicating what something is not
instances where something is stated as not occurring or being present
New Auto-Interp
Negative Logits
LG
-0.75
ngth
-0.68
ixel
-0.68
creen
-0.68
Tours
-0.65
¿½
-0.64
kamp
-0.64
hower
-0.64
gha
-0.63
ĸļ
-0.62
POSITIVE LOGITS
necessarily
1.27
icably
1.16
icable
1.13
epad
1.09
withstanding
1.06
etheless
1.05
eworthy
0.96
bothered
0.91
bothering
0.88
ifies
0.85
Activations Density 0.065%