INDEX
Explanations
references to moral concepts and Buddhist teachings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.16
0.5%
1380
+0.15
0.5%
1446
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1380
+0.16
0.03
509
+0.15
0.06
1446
+0.14
0.03
Negative Logits
***!
-0.71
Championnat
-0.68
scorso
-0.67
giù
-0.67
calciatore
-0.66
Unito
-0.66
interag
-0.66
Venise
-0.65
DoubleQuotes
-0.64
hoeddwyd
-0.64
POSITIVE LOGITS
McLaugh
0.94
McInt
0.89
Daven
0.80
Bartholo
0.80
<^
0.79
Harms
0.74
Cuth
0.73
Thier
0.73
Immig
0.69
Rine
0.69
Activations Density 0.414%