INDEX
Explanations
references to criticism and scrutiny of actions, particularly related to personal or social issues
New Auto-Interp
Negative Logits
gu
-0.15
roken
-0.15
avage
-0.15
подÑģ
-0.15
Wass
-0.15
eah
-0.14
cen
-0.14
549
-0.14
itra
-0.14
ENCH
-0.14
POSITIVE LOGITS
MBER
0.15
ause
0.15
GIR
0.14
ARIO
0.14
CTL
0.14
istrovstvÃŃ
0.14
Invoker
0.14
Elliott
0.13
Db
0.13
idge
0.13
Activations Density 0.263%