INDEX
Explanations
aspects related to emotional responses to social dynamics
New Auto-Interp
Negative Logits
autorytatywna
-1.09
myſelf
-1.08
Efq
-1.00
itſelf
-0.99
aarrggbb
-0.94
ſelf
-0.92
OGND
-0.90
ſelves
-0.90
houſe
-0.89
themſelves
-0.88
POSITIVE LOGITS
.
0.56
,
0.50
~
0.47
[
0.45
lost
0.44
zeug
0.44
ias
0.42
...
0.42
萌
0.42
[
0.41
Activations Density 0.254%