INDEX
Explanations
phrases related to intellectual analysis and critique
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
597
+0.10
0.3%
855
+0.10
0.3%
1438
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
62
+0.10
0.03
1438
+0.10
0.03
855
+0.10
0.03
Negative Logits
fasc
-0.69
pyram
-0.67
dispen
-0.65
mû
-0.63
XXXVII
-0.63
contex
-0.63
monot
-0.62
↔
-0.62
mait
-0.62
adal
-0.62
POSITIVE LOGITS
almost
0.59
bardziej
0.59
Več
0.59
practically
0.56
idać
0.56
daß
0.55
беріга
0.54
zelfs
0.53
orothy
0.53
virtually
0.53
Activations Density 0.175%