INDEX
Explanations
phrases related to information provided by anonymous sources
New Auto-Interp
Negative Logits
neys
-0.82
union
-0.78
nant
-0.74
tons
-0.74
ney
-0.73
Done
-0.72
NEY
-0.71
rex
-0.69
yg
-0.68
abase
-0.66
POSITIVE LOGITS
anonymity
1.11
ously
1.07
anonym
1.01
onym
0.89
anonymously
0.82
onyms
0.80
pseudonym
0.74
anonymous
0.74
arial
0.73
informant
0.72
Activations Density 0.032%