INDEX
Explanations
phrases related to convincing or persuading others to do something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1527
+0.14
0.5%
976
+0.13
0.4%
1194
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1490
+0.14
0.03
976
+0.13
0.03
1194
+0.12
0.02
Negative Logits
accla
-0.84
emphat
-0.83
embra
-0.78
fratern
-0.77
inconce
-0.76
pessi
-0.74
ingenu
-0.74
opport
-0.74
indestru
-0.72
philanth
-0.72
POSITIVE LOGITS
convince
1.15
convinced
1.08
persuade
1.07
convinces
1.01
persuaded
0.99
convincing
0.97
persuading
0.83
vinced
0.76
vincing
0.75
persuasion
0.74
Activations Density 0.086%