INDEX
Explanations
phrases related to privilege or private matters
terms related to privacy and privileged information
New Auto-Interp
Negative Logits
ORN
-0.82
WAYS
-0.81
calling
-0.76
IELD
-0.73
Shake
-0.72
BOX
-0.70
rers
-0.70
dry
-0.67
grass
-0.67
Ducks
-0.67
POSITIVE LOGITS
ileged
1.44
ilege
1.36
acies
1.32
ately
1.16
urrent
1.09
atism
1.04
iless
1.04
acy
0.99
ession
0.96
aband
0.93
Activations Density 0.012%