INDEX
Explanations
phrases and sentences related to data privacy and security
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.12
0.3%
1403
+0.10
0.3%
1368
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1701
+0.12
0.05
1553
+0.10
0.05
1368
+0.09
0.03
Negative Logits
Souha
-0.92
Mejía
-0.82
Messieurs
-0.79
pamph
-0.77
shenan
-0.76
Shakspeare
-0.76
poetical
-0.71
McLaugh
-0.71
Joaqu
-0.69
Mlle
-0.68
POSITIVE LOGITS
privacy
0.76
Privacy
0.71
Privacy
0.71
ersonal
0.66
adat
0.66
frans
0.64
privacy
0.63
anonim
0.63
meda
0.60
dita
0.59
Activations Density 0.458%