INDEX
Explanations
-information related to social issues like healthcare, women's rights, and food safety
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
651
+0.08
0.2%
1062
+0.08
0.2%
394
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.08
0.04
394
+0.08
0.06
651
+0.08
0.04
Negative Logits
aquare
-0.88
stoff
-0.84
franz
-0.84
affez
-0.84
perle
-0.83
marte
-0.83
fluo
-0.80
ananas
-0.80
quí
-0.78
persil
-0.78
POSITIVE LOGITS
occurs
0.89
isn
0.89
is
0.84
happens
0.81
seems
0.81
wasn
0.80
abounds
0.79
becomes
0.78
tends
0.78
has
0.77
Activations Density 0.508%