INDEX
Explanations
expressions related to physical locations or towns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1350
+0.08
0.2%
33
+0.07
0.2%
789
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.08
0.01
1343
+0.07
0.04
1363
+0.06
0.03
Negative Logits
ineffec
-0.66
introduce
-0.62
Jamieson
-0.60
later
-0.59
fatally
-0.59
Adair
-0.58
unjustly
-0.58
preferentially
-0.57
retracted
-0.57
wrongfully
-0.57
POSITIVE LOGITS
wn
1.89
own
1.50
owns
1.31
awn
1.21
OWN
1.15
WN
1.12
meis
1.11
applau
1.08
bourgeo
1.08
fatis
1.07
Activations Density 0.300%