INDEX
Explanations
personal reflections and expressions of self-identity
New Auto-Interp
Negative Logits
atsu
-0.16
amen
-0.15
abella
-0.14
ÅĤe
-0.14
ardo
-0.14
apr
-0.14
æ¡Ī
-0.13
ander
-0.13
imus
-0.13
igo
-0.13
POSITIVE LOGITS
gros
0.15
tas
0.15
also
0.15
Crosby
0.15
tower
0.14
¥
0.14
také
0.14
#echo
0.14
alic
0.14
ilers
0.14
Activations Density 0.211%