INDEX
Explanations
words related to communication and interaction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
559
+0.12
0.4%
1406
+0.12
0.4%
597
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
559
+0.12
0.03
597
+0.12
0.02
1406
+0.11
0.02
Negative Logits
lele
-0.64
Palembang
-0.58
Rued
-0.54
palab
-0.54
AccessLevel
-0.52
kram
-0.52
Sinal
-0.51
bandeau
-0.51
uhr
-0.51
alko
-0.51
POSITIVE LOGITS
hang
1.23
hangs
1.20
hanging
1.15
hung
1.12
Hang
1.07
hanging
0.99
hang
0.99
Hanging
0.98
Hanging
0.97
hanged
0.96
Activations Density 0.072%