INDEX
Explanations
prominent nouns and associated educational concepts
New Auto-Interp
Negative Logits
zem
-0.15
unte
-0.14
ner
-0.14
Mojo
-0.14
thy
-0.14
asto
-0.13
лоб
-0.13
iqueta
-0.13
.Azure
-0.13
ampoo
-0.13
POSITIVE LOGITS
auce
0.16
VERS
0.15
kyt
0.14
Stam
0.14
ATRIX
0.14
yme
0.13
377
0.13
chwitz
0.13
Sele
0.13
bx
0.13
Activations Density 0.025%