INDEX
Explanations
words related to symbolism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
871
+0.13
0.5%
1516
+0.12
0.5%
1677
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1516
+0.13
0.02
1677
+0.12
0.02
1363
+0.12
0.02
Negative Logits
intrigu
-0.72
psg
-0.69
jurassic
-0.69
pikachu
-0.68
disagre
-0.66
rtx
-0.65
inconce
-0.65
intersper
-0.64
Machia
-0.64
blackpink
-0.63
POSITIVE LOGITS
symbol
1.43
symbol
1.38
Symbol
1.35
Symbol
1.32
symbols
1.30
SYMBOL
1.17
symbols
1.17
Symbols
1.11
symbole
1.10
Symbols
1.08
Activations Density 0.101%