INDEX
Explanations
phrases containing the conjunction "and" along with references to colors, particularly "black" and "white."
New Auto-Interp
Negative Logits
ettle
-0.15
avigate
-0.15
sz
-0.14
stabbing
-0.14
Ã¥n
-0.13
ναν
-0.13
iken
-0.13
رب
-0.13
Cotton
-0.13
oure
-0.13
POSITIVE LOGITS
white
0.32
White
0.28
white
0.27
whites
0.27
çϽ
0.26
WHITE
0.24
White
0.24
-white
0.24
çϽ
0.23
WHITE
0.23
Activations Density 0.041%