INDEX
Explanations
words related to attitudes and perceptions
New Auto-Interp
Negative Logits
zes
-0.17
cers
-0.16
jedn
-0.16
pez
-0.15
лиÑĨ
-0.15
cies
-0.15
å©·
-0.14
IGIN
-0.14
idding
-0.14
ej
-0.14
POSITIVE LOGITS
itude
0.38
ending
0.36
itudes
0.36
orney
0.35
acking
0.34
endant
0.33
acked
0.33
ended
0.31
orneys
0.31
acker
0.31
Activations Density 0.011%