INDEX
Explanations
phrases related to academic citations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.15
0.4%
1967
+0.12
0.3%
1978
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
321
+0.15
0.05
776
+0.12
0.05
1804
+0.11
0.04
Negative Logits
maneu
-1.98
shenan
-1.96
unspeak
-1.96
hairc
-1.95
impra
-1.93
disagre
-1.92
increa
-1.89
apprehen
-1.89
depic
-1.88
indestru
-1.87
POSITIVE LOGITS
kosme
0.86
silikon
0.81
alkoh
0.79
pól
0.76
konserv
0.74
radikal
0.72
solidar
0.72
kabel
0.72
kontrak
0.70
minimalis
0.69
Activations Density 0.144%