INDEX
Explanations
phrases indicating past actions and events
New Auto-Interp
Negative Logits
lation
-0.76
Translation
-0.70
Universal
-0.67
¨
-0.67
çīĪ
-0.67
endings
-0.66
elman
-0.66
ankind
-0.66
reality
-0.65
~~
-0.65
POSITIVE LOGITS
unlawfully
1.01
voluntarily
0.89
mol
0.89
arrested
0.86
illegally
0.85
peacefully
0.84
felony
0.83
abducted
0.83
erratic
0.83
explosives
0.82
Activations Density 0.205%