INDEX
Explanations
words related to the concepts of "red" and "rod."
New Auto-Interp
Negative Logits
apo
-0.74
citiz
-0.71
imates
-0.69
ahi
-0.63
ubis
-0.63
chens
-0.60
bern
-0.60
emis
-0.58
que
-0.58
favors
-0.57
POSITIVE LOGITS
ynam
1.02
ynamic
0.99
ollar
0.78
inson
0.77
ership
0.75
iol
0.75
igate
0.71
hair
0.69
hands
0.66
orf
0.65
Activations Density 0.144%