INDEX
Explanations
mentions of family members and authority figures in relation to decision-making
New Auto-Interp
Negative Logits
(
-0.16
,
-0.15
ponents
-0.15
iddle
-0.14
Hed
-0.14
ảo
-0.14
Scar
-0.14
emer
-0.14
omat
-0.14
maker
-0.14
POSITIVE LOGITS
readcr
0.16
ylie
0.15
ãĥĥãĥģ
0.15
سÙĪ
0.14
Dims
0.14
oyal
0.14
ouz
0.14
EXEMPLARY
0.14
aggable
0.13
ünd
0.13
Activations Density 0.371%