INDEX
Explanations
mentions of the city "Houston"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
597
+0.17
0.7%
1034
+0.16
0.7%
765
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1034
+0.17
0.03
597
+0.16
0.03
765
+0.14
0.03
Negative Logits
cushi
-0.98
shenan
-0.97
increa
-0.97
ecru
-0.96
reluct
-0.95
suscep
-0.95
scrat
-0.95
unce
-0.92
embodi
-0.91
disagre
-0.91
POSITIVE LOGITS
Houston
1.51
Houston
1.36
HOU
0.99
HOU
0.97
Astros
0.92
Texans
0.91
Texas
0.86
Hou
0.82
hou
0.82
Texas
0.78
Activations Density 0.078%