INDEX
Explanations
mentions of political figures and healthcare issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.16
0.5%
1741
+0.16
0.5%
394
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.16
0.10
1265
+0.16
0.07
736
+0.10
0.07
Negative Logits
praktik
-0.64
diki
-0.61
konsult
-0.61
ekst
-0.61
antik
-0.61
autogui
-0.59
pecuni
-0.59
meras
-0.58
konserv
-0.57
stili
-0.57
POSITIVE LOGITS
quitted
0.85
vainly
0.72
gild
0.67
impelled
0.66
endeavouring
0.65
shuddered
0.65
interposed
0.64
tolerably
0.64
kindled
0.64
strove
0.63
Activations Density 0.550%