INDEX
Explanations
references to personal ownership or possession
New Auto-Interp
Negative Logits
ress
-0.19
RESS
-0.16
agens
-0.16
igger
-0.16
arium
-0.15
s
-0.15
anga
-0.14
markers
-0.14
ise
-0.14
expl
-0.14
POSITIVE LOGITS
оÑĪ
0.17
redi
0.15
awner
0.14
zcze
0.14
_GU
0.14
ulls
0.14
991
0.13
ascus
0.13
729
0.13
vetica
0.13
Activations Density 0.010%