INDEX
Explanations
personal stories or anecdotes shared anonymously
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1964
+0.15
0.7%
1557
+0.13
0.6%
1777
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1557
+0.15
0.02
1964
+0.13
0.02
198
+0.10
0.02
Negative Logits
<bos>
-2.73
AddWithValue
-0.72
Subjects
-0.68
Во
-0.67
import
-0.67
Wy
-0.64
WriteLiteral
-0.64
port
-0.63
break
-0.63
term
-0.63
POSITIVE LOGITS
maneu
1.66
indestru
1.46
lamborghini
1.44
mondeo
1.43
reluct
1.43
scrat
1.41
fortn
1.41
shenan
1.39
squa
1.38
emphat
1.37
Activations Density 0.194%