INDEX
Explanations
quotes or phrases containing the words "In other words"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
75
+0.10
0.3%
872
+0.10
0.3%
1129
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
62
+0.10
0.02
75
+0.10
0.02
569
+0.09
0.03
Negative Logits
suspic
-1.28
embra
-1.27
nece
-1.20
fuf
-1.20
fasc
-1.19
dispen
-1.17
inev
-1.16
secon
-1.15
pessi
-1.14
ftu
-1.13
POSITIVE LOGITS
:
0.56
obacz
0.53
зульта
0.53
tört
0.52
riction
0.51
words
0.51
szól
0.51
hoffe
0.51
alá
0.50
palabras
0.50
Activations Density 0.093%