INDEX
Explanations
negative portrayals of individuals, particularly focusing on characteristics such as arrogance and hypocrisy
New Auto-Interp
Negative Logits
iſt
-0.71
\\
-0.68
Hozzáférés
-0.66
,¹
-0.63
Lister
-0.60
XNUMX
-0.59
^(@)
-0.57
utafitiHapana
-0.57
ſind
-0.56
\\
-0.54
POSITIVE LOGITS
IIRC
0.65
körül
0.63
TestingModule
0.61
culturelle
0.59
aurait
0.59
romero
0.57
annuel
0.55
ggf
0.55
OFDb
0.55
inderdaad
0.55
Activations Density 0.611%