INDEX
Explanations
instances of political dialogue and responses related to education and leadership
New Auto-Interp
Negative Logits
uspend
-0.18
ĨĴ
-0.15
ilde
-0.15
æĬ¬
-0.14
.arguments
-0.14
interp
-0.14
asking
-0.13
suspend
-0.13
egend
-0.13
ãģĹãĤĩãģĨ
-0.13
POSITIVE LOGITS
admitted
0.26
admit
0.24
crypt
0.24
hed
0.23
appeared
0.21
admits
0.21
acknowledge
0.21
seemed
0.21
admitting
0.21
admission
0.20
Activations Density 0.314%