INDEX
Explanations
descriptive color terminology and related adjectives
New Auto-Interp
Negative Logits
hood
-0.15
esty
-0.15
iju
-0.14
assistant
-0.14
Painter
-0.14
FLAGS
-0.14
Assistant
-0.14
evin
-0.14
izu
-0.14
aurant
-0.14
POSITIVE LOGITS
-col
0.30
èī²
0.30
-colored
0.29
-Col
0.28
-ton
0.27
col
0.27
colored
0.26
èī²çļĦ
0.25
tone
0.24
-tone
0.23
Activations Density 0.063%