INDEX
Explanations
phrases indicating capability or potential in relation to actions
New Auto-Interp
Negative Logits
تب
-0.14
indo
-0.14
اÙħبر
-0.14
j
-0.14
jur
-0.13
stadt
-0.13
ذ
-0.13
practise
-0.13
vil
-0.13
pora
-0.13
POSITIVE LOGITS
cia
0.15
omin
0.15
адки
0.15
(Intent
0.14
ida
0.14
ɵ
0.14
unan
0.14
ëĶĶìĭľ
0.14
-bodied
0.14
flater
0.14
Activations Density 0.022%