INDEX
Explanations
words related to categories or classifications
New Auto-Interp
Negative Logits
uptools
-0.19
te
-0.18
raud
-0.16
tti
-0.16
ngen
-0.16
traits
-0.16
nard
-0.16
tec
-0.16
lli
-0.15
sla
-0.15
POSITIVE LOGITS
cly
0.23
ción
0.23
re
0.22
h
0.21
rella
0.19
hom
0.18
ways
0.18
c
0.18
cube
0.17
fal
0.17
Activations Density 0.020%