INDEX
Explanations
phrases indicating capability or ability to perform actions
New Auto-Interp
Negative Logits
ur
-0.17
indent
-0.15
osphere
-0.15
onta
-0.15
fair
-0.15
viso
-0.14
اÙĦع
-0.14
sen
-0.14
Ney
-0.14
ãĤ§
-0.14
POSITIVE LOGITS
hazi
0.16
endez
0.15
ivant
0.15
veyor
0.15
aims
0.14
ifikasi
0.14
ehir
0.14
_PUS
0.14
oppins
0.14
rosse
0.13
Activations Density 0.037%