INDEX
Explanations
references to the color white or its variations in context
New Auto-Interp
Negative Logits
adesh
-0.16
agger
-0.15
Ñĥв
-0.15
amma
-0.15
purple
-0.14
yny
-0.14
mann
-0.14
uby
-0.14
IMP
-0.14
emap
-0.14
POSITIVE LOGITS
WHITE
0.26
-white
0.25
white
0.25
WHITE
0.24
White
0.24
White
0.23
white
0.22
çϽ
0.21
.White
0.20
سÙģÛĮد
0.20
Activations Density 0.072%