INDEX
Explanations
references to entities or items with the word "Red" in them
mentions of the word "Red."
New Auto-Interp
Negative Logits
SPONSORED
-0.88
ILA
-0.81
4090
-0.76
ISTORY
-0.74
OHN
-0.74
gerald
-0.72
POLIT
-0.70
ammad
-0.70
ATURES
-0.70
Õ
-0.70
POSITIVE LOGITS
uces
1.07
Sox
1.06
ucer
0.99
ucing
0.99
cliffe
0.96
neck
0.96
eem
0.95
rawn
0.94
emption
0.93
oubt
0.92
Activations Density 0.017%