INDEX
Explanations
items described by their colors and features
descriptive attributes related to colors and features
New Auto-Interp
Negative Logits
igious
-0.78
ealous
-0.74
bably
-0.71
ngth
-0.71
izarre
-0.69
atican
-0.67
urdue
-0.66
financial
-0.65
utenberg
-0.64
xious
-0.63
POSITIVE LOGITS
flakes
0.77
stadt
0.71
oxide
0.70
ioxide
0.69
blot
0.69
stripes
0.69
striped
0.68
Iris
0.67
Ultr
0.66
espresso
0.66
Activations Density 0.394%