INDEX
Explanations
words related to secret information or investigations
references to investigative reports or documents containing allegations
New Auto-Interp
Negative Logits
speed
-0.81
Pyr
-0.76
rh
-0.76
rl
-0.75
Saw
-0.69
Wilson
-0.68
Mariners
-0.67
alone
-0.65
gro
-0.65
Instrument
-0.64
POSITIVE LOGITS
dossier
1.46
trove
0.96
compiled
0.87
ossier
0.83
bombshell
0.83
memos
0.79
fodder
0.75
allegations
0.74
Ukrain
0.73
alleging
0.73
Activations Density 0.027%