INDEX
Explanations
mentions of the color "gray" or variations of it
New Auto-Interp
Negative Logits
nect
-0.15
agy
-0.14
aight
-0.14
кÑĢаÑĹ
-0.14
McGr
-0.14
ãĥ³ãĥī
-0.14
pon
-0.14
.invoke
-0.13
Fairfield
-0.13
åŃ
-0.13
POSITIVE LOGITS
Gad
0.16
bi
0.15
fisse
0.15
ifix
0.15
ccione
0.14
atasets
0.14
DEST
0.14
оби
0.14
ì±ħ
0.14
cntl
0.14
Activations Density 0.006%