INDEX
Explanations
references to information leaks or breaches
references to information breaches or disclosures
New Auto-Interp
Negative Logits
ryu
-0.70
Schw
-0.65
Krish
-0.64
Sax
-0.64
eka
-0.63
Num
-0.63
Cynthia
-0.62
Sue
-0.62
Sax
-0.62
song
-0.61
POSITIVE LOGITS
leaks
3.96
leaking
2.26
Leaks
2.24
leaked
2.18
leak
2.15
leakage
1.84
spills
1.75
disclosures
1.48
releases
1.44
whistleblowers
1.41
Activations Density 0.011%