INDEX
Explanations
phrases related to giving commands or instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1510
+0.08
0.2%
1415
+0.07
0.2%
1978
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1377
+0.08
0.03
1415
+0.07
0.03
1759
+0.07
0.04
Negative Logits
Traité
-0.98
Mémoires
-0.98
fta
-0.98
hcm
-0.96
milf
-0.95
Rois
-0.93
fte
-0.92
fter
-0.92
guarante
-0.92
„,
-0.88
POSITIVE LOGITS
Havolalar
0.64
quit
0.61
stop
0.57
stay
0.56
go
0.54
switch
0.54
get
0.54
مشين
0.53
calma
0.53
testify
0.52
Activations Density 0.321%