INDEX
Explanations
references to power dynamics and governmental control in society
New Auto-Interp
Negative Logits
bbe
-0.16
Anyway
-0.15
untime
-0.15
ongan
-0.15
irresistible
-0.14
sez
-0.14
kü
-0.14
надо
-0.14
incidental
-0.14
çĮĽ
-0.13
POSITIVE LOGITS
void
0.18
engr
0.17
cater
0.17
chalk
0.16
缸
0.15
inline
0.15
jov
0.15
apart
0.15
hypo
0.15
VOID
0.15
Activations Density 0.396%