INDEX
Explanations
mentions of revealing hidden information or identities
references to personal identity and significant revelations
New Auto-Interp
Negative Logits
oslav
-0.75
trak
-0.72
Reviewed
-0.70
upkeep
-0.69
chairs
-0.68
HAM
-0.63
uated
-0.63
ucci
-0.62
assisted
-0.61
sta
-0.60
POSITIVE LOGITS
secrets
0.94
vulnerabilities
0.83
clues
0.82
whereabouts
0.78
trove
0.77
incrim
0.76
WikiLeaks
0.73
truths
0.73
details
0.72
Hidden
0.72
Activations Density 0.238%