INDEX
Explanations
references to colors and color-related descriptors
New Auto-Interp
Negative Logits
iken
-0.15
utin
-0.15
put
-0.14
ol
-0.14
lat
-0.14
ests
-0.14
usto
-0.14
cke
-0.13
happiest
-0.13
pper
-0.13
POSITIVE LOGITS
лÑİд
0.16
年代
0.15
YLE
0.15
vej
0.15
neg
0.15
ongyang
0.14
imli
0.14
å¼
0.14
.truth
0.14
ÙĪØ«
0.14
Activations Density 0.035%