INDEX
Explanations
mentions of terms and conditions, service agreements, legal language, and violations of terms of service
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1535
+0.12
0.4%
596
+0.11
0.3%
453
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
596
+0.12
0.05
1334
+0.11
0.05
1034
+0.11
0.04
Negative Logits
unspeak
-0.83
impractica
-0.72
Petitioners
-0.70
unlaw
-0.70
avancée
-0.69
endeavouring
-0.67
unwarran
-0.67
thereupon
-0.67
exemplaire
-0.67
proportionately
-0.66
POSITIVE LOGITS
solidar
1.18
ideolog
1.10
kosme
0.96
meras
0.95
utop
0.95
alkoh
0.95
minimalis
0.92
impon
0.91
prostitu
0.91
akus
0.91
Activations Density 0.269%