INDEX
Explanations
references to spoilers in various contexts
New Auto-Interp
Negative Logits
agi
-0.16
ument
-0.16
mae
-0.15
inine
-0.15
mente
-0.14
elic
-0.14
patial
-0.14
Koch
-0.14
/INFO
-0.14
ELY
-0.14
POSITIVE LOGITS
spo
0.30
Spo
0.26
Spo
0.25
spo
0.22
ilers
0.21
spoil
0.21
ilt
0.21
orth
0.19
spoiler
0.18
elman
0.18
Activations Density 0.007%