INDEX
Explanations
phrases indicating frequency or repetition
New Auto-Interp
Negative Logits
mouth
-0.85
icide
-0.77
illus
-0.76
ican
-0.74
heit
-0.73
initialized
-0.73
yout
-0.72
ices
-0.71
TPPStreamerBot
-0.71
bush
-0.70
POSITIVE LOGITS
afar
1.36
whence
1.00
scratch
0.97
outset
0.88
experience
0.80
Above
0.77
examining
0.76
standpoint
0.75
observing
0.74
watching
0.73
Activations Density 0.040%