INDEX
Explanations
references to the color red
occurrences of the word "Red"
New Auto-Interp
Negative Logits
awaru
-0.73
Ö¼
-0.70
AGES
-0.68
vre
-0.67
merce
-0.66
prest
-0.66
ILA
-0.64
physic
-0.62
ilities
-0.62
uncom
-0.61
POSITIVE LOGITS
ucing
1.34
eem
1.33
emption
1.21
ucer
1.21
uced
1.19
uces
1.18
irect
1.16
acted
1.14
cliffe
1.13
uctions
1.06
Activations Density 0.018%