INDEX
Explanations
phrases asking someone to do something or imperatives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
130
+0.13
0.4%
689
+0.11
0.4%
1271
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
130
+0.13
0.04
689
+0.11
0.04
1271
+0.11
0.03
Negative Logits
limsy
-0.67
kanton
-0.65
€€
-0.63
€/
-0.63
maketitle
-0.60
Ministero
-0.59
pietre
-0.57
BnF
-0.56
bronz
-0.56
zó
-0.55
POSITIVE LOGITS
let
1.21
LET
1.15
Let
1.13
Let
1.11
let
1.11
lets
0.98
Lets
0.94
LET
0.89
Lets
0.87
letting
0.84
Activations Density 0.047%