INDEX
Explanations
mention or references to ghosts
the repeated mention of the term "ghost"
New Auto-Interp
Negative Logits
erity
-0.79
Effective
-0.70
tics
-0.70
onian
-0.66
enegger
-0.66
tical
-0.64
ventions
-0.63
oulos
-0.62
YC
-0.59
percentile
-0.59
POSITIVE LOGITS
busters
1.15
buster
1.10
writer
0.99
written
0.92
writing
0.88
haunting
0.86
ly
0.85
door
0.83
liness
0.80
writers
0.80
Activations Density 0.049%