INDEX
Explanations
proper nouns and names related to a specific context or category
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
137
+0.11
0.3%
184
+0.10
0.3%
394
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
137
+0.11
0.05
184
+0.10
0.02
227
+0.09
0.05
Negative Logits
Autoritní
-0.73
Controllo
-0.71
satte
-0.70
Walkover
-0.68
saper
-0.68
noten
-0.68
abbra
-0.68
utop
-0.67
gymnas
-0.65
lapto
-0.64
POSITIVE LOGITS
Shakspeare
1.00
Shaksp
0.95
depic
0.91
McLaugh
0.89
inappro
0.89
embodi
0.89
pamph
0.88
fath
0.87
unspeak
0.87
indestru
0.85
Activations Density 0.417%