INDEX
Explanations
references to payment or salary-related information
New Auto-Interp
Negative Logits
ritz
-0.17
iliki
-0.15
entai
-0.15
ansa
-0.15
foreground
-0.15
utex
-0.15
radios
-0.14
лова
-0.14
INI
-0.14
rale
-0.14
POSITIVE LOGITS
Late
0.39
Late
0.34
late
0.29
late
0.25
Fallon
0.25
Tonight
0.24
Letter
0.23
Jimmy
0.23
Letter
0.21
Colbert
0.21
Activations Density 0.048%