INDEX
Explanations
phrases containing the word "which" followed by various contextual information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
513
+0.13
0.4%
871
+0.10
0.3%
795
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
513
+0.13
0.05
795
+0.10
0.04
528
+0.10
0.04
Negative Logits
kram
-1.36
lele
-1.32
hina
-1.15
gend
-1.15
meis
-1.13
ananas
-1.11
krab
-1.10
tomat
-1.10
karton
-1.08
saar
-1.07
POSITIVE LOGITS
which
0.82
thereupon
0.80
tolerably
0.77
vainly
0.77
Which
0.77
Which
0.73
whither
0.73
apprehen
0.71
exasper
0.70
makes
0.69
Activations Density 0.134%