INDEX
Explanations
references to colors and their descriptions
New Auto-Interp
Negative Logits
urr
-0.17
esi
-0.16
et
-0.15
errer
-0.15
olin
-0.15
els
-0.15
-century
-0.14
ester
-0.14
-style
-0.14
ively
-0.14
POSITIVE LOGITS
-coded
0.22
ation
0.19
blind
0.19
/color
0.18
issant
0.17
atura
0.17
imeter
0.16
chemes
0.16
scheme
0.16
ings
0.15
Activations Density 0.067%