INDEX
Explanations
mentions of the color red
New Auto-Interp
Negative Logits
uador
-0.76
=-=-
-0.73
PsyNetMessage
-0.72
ilities
-0.72
cffffcc
-0.71
rolet
-0.71
vre
-0.71
=-=-=-=-
-0.69
apist
-0.67
BIL
-0.67
POSITIVE LOGITS
beard
1.14
berry
1.10
stone
1.05
heads
1.03
bird
1.02
horse
1.01
haired
0.99
headed
0.99
hawk
0.98
fish
0.97
Activations Density 3.003%