INDEX
Explanations
phrases or conditional statements indicating the ability of subjects to perform actions or achieve outcomes
New Auto-Interp
Negative Logits
аÑĪ
-0.15
acket
-0.15
uh
-0.15
Serious
-0.15
aku
-0.14
Responsible
-0.14
icip
-0.14
orig
-0.14
ahat
-0.14
zi
-0.13
POSITIVE LOGITS
بتÙĪØ§ÙĨ
0.18
INTERRU
0.15
can
0.15
proper
0.15
accred
0.15
better
0.14
ITED
0.14
hopefully
0.14
ballo
0.13
ouns
0.13
Activations Density 0.095%