INDEX
Explanations
terms related to whistleblowing and whistleblowers
New Auto-Interp
Negative Logits
elman
-0.17
ement
-0.16
zet
-0.15
urat
-0.15
ture
-0.15
slave
-0.15
aku
-0.15
tures
-0.14
Parr
-0.14
ieve
-0.14
POSITIVE LOGITS
blown
0.27
blowing
0.25
blow
0.23
blew
0.22
blows
0.20
-wh
0.20
wh
0.20
åIJ¹
0.19
Blow
0.18
Wh
0.18
Activations Density 0.021%