INDEX
Explanations
expressions of self-reference and personal pronouns
New Auto-Interp
Negative Logits
pleaſure
-0.78
AssemblyTitle
-0.75
صوتيه
-0.70
myſelf
-0.69
poffible
-0.68
Personensuche
-0.67
OGND
-0.62
itſelf
-0.60
betweenstory
-0.60
Efq
-0.59
POSITIVE LOGITS
vă
0.62
nos
0.58
haberse
0.54
vous
0.54
lhes
0.54
θα
0.53
își
0.52
se
0.50
îi
0.50
îl
0.50
Activations Density 0.100%