INDEX
Explanations
verbs related to actions, decision-making, and emotions like thinking, trying, feeling, and being sorry
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
145
+0.08
0.2%
1978
+0.07
0.2%
878
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1309
+0.08
0.03
1904
+0.07
0.04
330
+0.07
0.03
Negative Logits
affor
-1.11
impra
-1.10
increa
-1.07
maneu
-1.05
uniqu
-1.03
fta
-1.02
inappro
-1.00
accla
-0.99
lola
-0.98
stockholm
-0.97
POSITIVE LOGITS
worry
0.64
focus
0.61
worried
0.54
worrying
0.54
necessarily
0.53
thinking
0.53
anything
0.52
tonode
0.51
focusing
0.50
worries
0.49
Activations Density 0.243%