INDEX
Explanations
specific names, particularly the name "Roger" with varying intensities
mentions of the name "Roger."
New Auto-Interp
Negative Logits
runtime
-0.80
ipeg
-0.79
rises
-0.73
amaz
-0.71
esthetic
-0.69
places
-0.67
ihar
-0.67
pai
-0.66
spir
-0.65
nect
-0.65
POSITIVE LOGITS
Goodell
0.98
Zel
0.87
Waters
0.83
rique
0.79
Rabbit
0.79
Ding
0.78
Godd
0.78
rers
0.77
Peck
0.75
Bacon
0.75
Activations Density 0.012%