INDEX
Explanations
mentions of obligations or necessities
New Auto-Interp
Negative Logits
iry
-0.15
239
-0.15
fol
-0.15
lant
-0.15
Dul
-0.14
Startup
-0.14
Overnight
-0.14
Äijá»
-0.14
Ùĩر
-0.13
itary
-0.13
POSITIVE LOGITS
admit
0.30
confession
0.24
confess
0.23
admitting
0.23
admission
0.20
admits
0.20
admitted
0.19
confessed
0.18
admissions
0.18
óż
0.17
Activations Density 0.036%