INDEX
Explanations
definitions or explanations of terms or concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.1%
1451
+0.10
0.5%
468
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.24
0.05
703
+0.10
0.05
276
+0.10
0.04
Negative Logits
<bos>
-2.78
ⓧ
-0.81
/**
-0.68
/*
-0.61
Autoritní
-0.61
///**
-0.60
-0.60
#![
-0.58
,
-0.57
continue
-0.57
POSITIVE LOGITS
Minang
1.39
véhic
1.36
fatis
1.35
Juf
1.35
maroc
1.34
tramont
1.31
applau
1.31
socie
1.31
mef
1.30
wien
1.30
Activations Density 0.144%