INDEX
Explanations
references to the 1980s and vintage pop culture elements
New Auto-Interp
Negative Logits
imest
-0.16
ationToken
-0.15
-disabled
-0.14
inker
-0.14
iqu
-0.14
rone
-0.14
lose
-0.14
201
-0.14
194
-0.14
.ng
-0.14
POSITIVE LOGITS
-'
0.26
/'
0.26
s
0.18
era
0.18
-era
0.17
/"
0.17
-"
0.15
Era
0.15
era
0.14
ilor
0.14
Activations Density 0.006%