INDEX
Explanations
instances of the substring "wh," indicating a focus on words or phrases that begin with "wh."
New Auto-Interp
Negative Logits
ously
-0.18
abel
-0.16
èIJ
-0.15
itura
-0.15
Hra
-0.15
esti
-0.15
hes
-0.15
Bull
-0.15
IVEN
-0.14
lor
-0.14
POSITIVE LOGITS
wh
0.27
-wh
0.21
ining
0.20
izz
0.19
ipl
0.18
iners
0.18
.wh
0.18
ack
0.18
eras
0.18
oso
0.17
Activations Density 0.011%