INDEX
Explanations
phrases that indicate mechanisms of influence or control within society
New Auto-Interp
Negative Logits
zed
-0.14
æľ¬å½ĵ
-0.14
èIJ¥ä¸ļ
-0.13
.DefaultCellStyle
-0.13
à¹Ħว
-0.13
erge
-0.13
ield
-0.13
é«ĺæ¸ħ
-0.13
ctr
-0.13
damit
-0.13
POSITIVE LOGITS
means
0.29
sheer
0.27
puts
0.24
ought
0.24
various
0.24
/by
0.23
put
0.20
direct
0.20
ogh
0.20
means
0.20
Activations Density 0.123%