INDEX
Explanations
phrases related to responsibility and obligation
New Auto-Interp
Negative Logits
ãģ§ãģĻãģĮ
-0.13
ÑĪила
-0.12
istes
-0.12
اÛĮشاÙĨ
-0.12
'],$_
-0.12
']!='
-0.12
ÙĪÛĮ
-0.12
ITEM
-0.11
нÑıÑĤ
-0.11
коÑĤоÑĢÑĭÑħ
-0.10
POSITIVE LOGITS
it
1.27
å®ĥ
0.94
оно
0.79
It
0.78
It
0.72
nó
0.71
it
0.69
_it
0.69
,it
0.67
воно
0.60
Activations Density 4.551%