INDEX
Explanations
punctuation, specifically periods at the end of sentences
New Auto-Interp
Negative Logits
YG
-0.17
comm
-0.16
ooks
-0.15
Hatch
-0.15
cur
-0.14
.cur
-0.14
IVE
-0.14
heel
-0.14
ifica
-0.14
ive
-0.14
POSITIVE LOGITS
astr
0.17
ewn
0.15
arin
0.15
imson
0.14
olem
0.14
PEND
0.14
мага
0.14
alink
0.13
PLY
0.13
emachine
0.13
Activations Density 0.003%