INDEX
Explanations
mentions of the beginning or start of something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1052
+0.09
0.3%
1135
+0.09
0.3%
405
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1135
+0.09
0.03
405
+0.09
0.03
602
+0.09
0.02
Negative Logits
Literat
-0.73
Glej
-0.68
Iné
-0.64
Wię
-0.59
Pourtant
-0.58
Posteriormente
-0.58
__':
-0.57
Celular
-0.57
Біо
-0.56
Assista
-0.56
POSITIVE LOGITS
outset
0.67
ladri
0.63
herre
0.59
cittad
0.56
syd
0.55
hotell
0.54
costumi
0.53
;;)
0.52
signore
0.51
beginning
0.50
Activations Density 0.126%