INDEX
Explanations
phrases indicating the initiation of processes or concepts
New Auto-Interp
Negative Logits
Shapiro
-0.15
apia
-0.15
imi
-0.13
azu
-0.13
asil
-0.13
ico
-0.13
ابت
-0.13
Roth
-0.13
Acceler
-0.13
lich
-0.13
POSITIVE LOGITS
esson
0.19
ATO
0.17
tti
0.16
umpt
0.16
vanced
0.16
Ñĥж
0.16
parte
0.15
phans
0.15
ONGO
0.15
æļ
0.15
Activations Density 0.073%