INDEX
Explanations
words associated with stealth or secretive actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
59
+0.14
0.8%
111
+0.14
0.8%
127
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
127
+0.14
0.01
422
+0.14
0.01
59
+0.13
0.01
Negative Logits
hip
-1.58
poons
-1.53
seat
-1.43
odi
-1.42
anya
-1.40
head
-1.40
ione
-1.39
resa
-1.38
stomach
-1.37
myself
-1.35
POSITIVE LOGITS
¯
1.92
ĻĤ
1.68
¤
1.62
Ĵ
1.59
agogue
1.54
Īĺ
1.50
Ģ
1.50
ĺ
1.48
º
1.48
otypes
1.48
Activations Density 0.068%