INDEX
Explanations
references to various personal and societal responsibilities
New Auto-Interp
Negative Logits
tha
-0.18
ÙĨدا
-0.15
with
-0.14
ERVICE
-0.14
ël
-0.14
croft
-0.14
Opinion
-0.13
полож
-0.13
wards
-0.13
-0.13
POSITIVE LOGITS
regard
0.23
regards
0.21
stood
0.19
impunity
0.19
gusto
0.19
lac
0.18
uzzi
0.17
ering
0.17
ÏĦÏģÏĮ
0.16
silver
0.16
Activations Density 0.725%