INDEX
Explanations
terms related to control and repression of emotions or dissent
New Auto-Interp
Negative Logits
andra
-0.14
/use
-0.14
bee
-0.14
enou
-0.14
lut
-0.14
ifu
-0.13
ismu
-0.13
kinh
-0.13
ìĭ
-0.13
Ihnen
-0.13
POSITIVE LOGITS
/mit
0.24
ä½ı
0.20
ä½ı
0.19
expectations
0.19
/null
0.17
/pre
0.17
potential
0.16
further
0.15
expected
0.14
645
0.14
Activations Density 0.106%