INDEX
Explanations
mentions or references to the word "Orange"
references to "Orange," likely related to a specific location or entity
New Auto-Interp
Negative Logits
hurd
-0.76
irm
-0.70
--
-0.66
dj
-0.65
sha
-0.65
preempt
-0.64
streng
-0.64
lihood
-0.63
)--
-0.63
hw
-0.63
POSITIVE LOGITS
Orange
3.94
Orange
3.37
orange
2.03
orange
1.92
oranges
1.59
Purple
1.49
Irvine
1.41
Anaheim
1.41
Yellow
1.35
Riverside
1.33
Activations Density 0.027%