INDEX
Explanations
phrases related to personal experiences or stories
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.15
0.6%
1520
+0.15
0.6%
528
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.15
0.04
1520
+0.15
0.04
1415
+0.12
0.03
Negative Logits
Politica
-0.87
masaj
-0.86
fch
-0.86
Confe
-0.84
gubern
-0.83
toscana
-0.82
hcm
-0.82
Olimpia
-0.81
Simult
-0.81
quelquefois
-0.79
POSITIVE LOGITS
d
0.69
d
0.69
gdyby
0.63
raczej
0.59
jakby
0.56
)_/¯
0.54
gotta
0.53
hadn
0.53
xbd
0.52
/*
0.51
Activations Density 0.073%