INDEX
Explanations
personal pronouns related to oneself
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.1%
381
+0.13
0.8%
61
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
805
+0.18
0.04
325
+0.13
0.04
61
+0.11
0.04
Negative Logits
<bos>
-2.88
springfox
-0.77
guma
-0.69
EndProject
-0.68
lateinit
-0.66
-0.66
jakarta
-0.65
säkert
-0.62
/**
-0.61
</tbody>
-0.58
POSITIVE LOGITS
disreg
1.23
unlaw
1.18
malheure
1.16
habile
1.15
véhic
1.15
shenan
1.15
héro
1.10
effray
1.08
accla
1.07
expéri
1.05
Activations Density 0.117%