INDEX
Explanations
references to authority and its impact on actions and outcomes
New Auto-Interp
Negative Logits
millenn
-0.16
dramatic
-0.15
erotiske
-0.15
alon
-0.14
dramatically
-0.14
mythology
-0.14
erotik
-0.13
Ñģлив
-0.13
drama
-0.13
Invoker
-0.13
POSITIVE LOGITS
[
0.24
[=
0.20
aforementioned
0.18
yourselves
0.17
everyone
0.17
everybody
0.17
Our
0.16
very
0.16
(...)
0.16
Your
0.16
Activations Density 0.010%