INDEX
Explanations
keywords related to a software framework or programming structure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.20
1.2%
71
+0.12
0.8%
173
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
71
+0.20
0.01
469
+0.12
0.01
26
+0.11
0.01
Negative Logits
woke
-2.01
Virgin
-1.79
deceased
-1.69
gay
-1.63
sed
-1.62
obese
-1.62
homosexual
-1.57
res
-1.51
toast
-1.49
euthan
-1.48
POSITIVE LOGITS
"}](#
2.07
istry
1.89
ĸ
1.82
ULAR
1.81
CHA
1.73
cript
1.70
repertoire
1.69
erve
1.67
hel
1.66
Script
1.61
Activations Density 0.012%