INDEX
Explanations
phrases related to official or legal documents
references to investigative documents or cases involving reputational allegations
New Auto-Interp
Negative Logits
jc
-0.80
rh
-0.77
ahime
-0.74
erd
-0.73
aird
-0.72
arij
-0.71
rm
-0.70
alone
-0.69
Portug
-0.69
Glob
-0.69
POSITIVE LOGITS
dossier
1.59
Steele
1.27
ossier
0.99
mole
0.82
tro
0.75
sightings
0.74
memos
0.73
doping
0.73
agher
0.72
heaviest
0.70
Activations Density 0.018%