INDEX
Explanations
phrases that emphasize temporal context or repetition
New Auto-Interp
Negative Logits
atta
-0.15
uraa
-0.14
313
-0.14
gether
-0.14
nee
-0.14
kk
-0.14
Porn
-0.14
-house
-0.13
314
-0.13
cai
-0.13
POSITIVE LOGITS
best
0.22
odds
0.22
elic
0.20
-best
0.20
stake
0.20
(best
0.19
best
0.19
Odds
0.18
issue
0.17
minimum
0.17
Activations Density 0.063%