INDEX
Explanations
the concept of 'means' or methods used to achieve something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.28
1.3%
68
+0.14
0.7%
32
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1983
+0.28
0.03
68
+0.14
0.03
1705
+0.12
0.02
Negative Logits
<bos>
-2.24
/***
-0.62
脊
-0.62
public
-0.61
restore
-0.60
돕
-0.60
cu
-0.58
protected
-0.58
Stu
-0.58
Hunter
-0.57
POSITIVE LOGITS
milano
1.41
soggior
1.41
claudia
1.38
paradiso
1.38
coar
1.37
napoli
1.35
pymysql
1.33
santiago
1.33
affez
1.31
jorge
1.31
Activations Density 0.043%