INDEX
Explanations
names of politicians, locations, and legal terms related to individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.7%
1677
+0.10
0.6%
938
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1677
+0.12
0.02
1557
+0.10
0.02
58
+0.09
0.02
Negative Logits
<bos>
-2.68
<?
-0.96
-0.95
/***
-0.93
ⓧ
-0.80
<?
-0.79
/**
-0.79
//---
-0.71
endow
-0.64
abolish
-0.62
POSITIVE LOGITS
Rep
1.14
Rep
1.03
Reps
0.99
jawa
0.95
rep
0.95
REP
0.94
rep
0.93
reps
0.90
REP
0.85
Reps
0.83
Activations Density 0.108%