INDEX
Explanations
nouns and actions related to classification and categorization
New Auto-Interp
Negative Logits
iko
-0.17
ighth
-0.15
lico
-0.15
jev
-0.14
hev
-0.14
izo
-0.14
asco
-0.14
uras
-0.14
uh
-0.14
imen
-0.14
POSITIVE LOGITS
strup
0.18
thew
0.17
egrator
0.17
ân
0.15
Hop
0.14
slit
0.14
,proto
0.14
Proto
0.14
æį
0.14
licit
0.13
Activations Density 0.026%