INDEX
Explanations
phrases related to privacy or privilege
terms related to privacy and privilege
New Auto-Interp
Negative Logits
gaard
-0.68
Ryder
-0.63
Benedict
-0.62
GER
-0.62
Modest
-0.60
Glob
-0.58
Gen
-0.58
Frankenstein
-0.56
Danielle
-0.56
LER
-0.56
POSITIVE LOGITS
ilege
1.59
ileged
1.51
ately
1.02
ilage
1.00
opoly
0.96
opol
0.95
folios
0.93
ropri
0.89
uded
0.86
omore
0.86
Activations Density 0.055%