INDEX
Explanations
pronouns, particularly the word "they"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1034
+0.11
0.3%
381
+0.10
0.3%
814
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1034
+0.11
0.07
1919
+0.10
0.07
1415
+0.09
0.04
Negative Logits
panama
-1.04
jurassic
-1.03
squa
-1.00
stockholm
-0.98
franz
-0.96
Comand
-0.96
fatis
-0.95
vhs
-0.94
claudia
-0.94
budapest
-0.94
POSITIVE LOGITS
They
0.81
they
0.78
they
0.78
They
0.74
THEY
0.73
Their
0.72
themselves
0.71
their
0.69
THEY
0.67
họ
0.66
Activations Density 0.353%