INDEX
Explanations
phrases related to discrimination laws and legal compliance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
650
+0.08
0.2%
1107
+0.07
0.2%
1763
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.08
0.03
1363
+0.07
0.07
690
+0.07
0.08
Negative Logits
sergio
-0.78
authentique
-0.69
roberto
-0.68
THISDAY
-0.65
aussitôt
-0.63
Senti
-0.63
découv
-0.62
Să
-0.61
logotipo
-0.61
Dimensi
-0.60
POSITIVE LOGITS
∎
2.18
↩
1.22
pessi
1.19
banan
1.17
kram
1.13
ciga
1.12
Violon
1.10
luxem
1.10
bayern
1.10
ohr
1.10
Activations Density 0.589%