INDEX
Explanations
phrases indicating research studies and their methodologies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
0.9%
1757
+0.15
0.5%
1350
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1350
+0.25
0.10
1757
+0.15
0.10
1296
+0.15
0.07
Negative Logits
<bos>
-1.50
Cringe
-0.72
assiste
-0.70
confirme
-0.65
Fuckin
-0.58
feign
-0.57
croit
-0.56
constate
-0.56
/***
-0.55
captiv
-0.54
POSITIVE LOGITS
maroc
1.02
unwarran
1.01
Keny
0.99
Hez
0.94
bahay
0.93
kani
0.91
saad
0.90
bagay
0.90
mikrofon
0.89
susun
0.89
Activations Density 2.327%