INDEX
Explanations
words related to contrasts between different social or political groups
New Auto-Interp
Negative Logits
lenker
-0.82
Personendaten
-0.72
étroite
-0.70
AutoScale
-0.67
MemoryWarning
-0.63
vPvB
-0.59
strong
-0.59
ngdoc
-0.58
sólido
-0.58
cherchés
-0.58
POSITIVE LOGITS
if
0.86
if
0.76
but
0.64
если
0.64
though
0.62
though
0.61
but
0.60
although
0.60
jika
0.59
0.57
Activations Density 0.610%