INDEX
Explanations
words related to extreme scenarios or emotions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.13
0.4%
1380
+0.08
0.2%
1842
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.13
0.05
494
+0.08
0.04
1380
+0.08
0.01
Negative Logits
philanth
-0.80
exorbit
-0.75
horrend
-0.71
shenan
-0.70
Birgit
-0.67
compréhen
-0.67
dilap
-0.66
Karsten
-0.66
cuck
-0.66
chauffe
-0.65
POSITIVE LOGITS
sentito
0.68
parlato
0.64
isContained
0.64
chiesto
0.63
DISEASES
0.62
provato
0.62
FERENCE
0.61
SCRIBE
0.59
potuto
0.59
VISIONS
0.57
Activations Density 0.410%