INDEX
Explanations
names of a specific individual or character
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1053
+0.16
0.7%
204
+0.12
0.5%
1023
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1053
+0.16
0.02
966
+0.12
0.02
204
+0.12
0.02
Negative Logits
unspeak
-0.72
impelled
-0.65
unceasing
-0.60
wzgl
-0.59
plenti
-0.57
cushi
-0.56
indescri
-0.55
realizacji
-0.55
shewn
-0.54
unavoid
-0.53
POSITIVE LOGITS
Jon
1.45
Jon
1.38
jon
1.34
JON
1.32
Jonathan
1.20
Jonathan
1.16
JON
1.12
notor
1.04
kön
0.97
Sén
0.96
Activations Density 0.110%