INDEX
Explanations
words indicating contrast or comparison
New Auto-Interp
Negative Logits
Pyram
-0.65
PMailer
-0.64
targ
-0.62
camb
-0.62
Quan
-0.61
pron
-0.61
fout
-0.61
Varan
-0.61
Targ
-0.60
CPE
-0.60
POSITIVE LOGITS
abstrait
0.65
abstrato
0.59
Italij
0.58
zijne
0.56
aislados
0.52
démocr
0.52
étoient
0.52
avoient
0.52
Wikiseite
0.51
infierno
0.51
Activations Density 0.365%