INDEX
Explanations
motivation or hesitation in sharing personal stories online
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
305
+0.09
0.2%
198
+0.07
0.2%
1510
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1166
+0.09
0.05
1723
+0.07
0.04
305
+0.07
0.04
Negative Logits
aussitôt
-0.64
sûrement
-0.63
soudain
-0.61
précédemment
-0.61
malheureusement
-0.60
justement
-0.59
vanta
-0.57
inerja
-0.55
lentement
-0.55
constamment
-0.54
POSITIVE LOGITS
bandai
0.54
openly
0.53
smtplib
0.51
pymysql
0.51
admitting
0.51
Personendaten
0.49
admit
0.48
overtly
0.48
anything
0.48
zoon
0.47
Activations Density 0.306%