INDEX
Explanations
quotes and statements that include strongly negative emotions or conflicts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.16
0.5%
1265
+0.12
0.4%
506
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
950
+0.16
0.03
1265
+0.12
0.02
667
+0.10
0.02
Negative Logits
impra
-1.29
snoopy
-1.26
scrat
-1.24
hairc
-1.24
horrend
-1.20
exorbit
-1.18
tupperware
-1.18
swarovski
-1.17
cushi
-1.17
ecru
-1.15
POSITIVE LOGITS
<bos>
0.84
----</
0.76
_(
0.72
sic
0.66
{(0.64
kemer
0.64
mercad
0.62
)(
0.61
」(
0.61
|(
0.61
Activations Density 0.057%