INDEX
Explanations
phrases related to irony
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
869
+0.13
0.4%
1187
+0.12
0.4%
537
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
869
+0.13
0.03
2030
+0.12
0.02
538
+0.11
0.02
Negative Logits
Ressource
-0.91
Cartes
-0.90
quí
-0.85
cannes
-0.84
incess
-0.83
siff
-0.80
glan
-0.80
Rois
-0.79
prodi
-0.79
doman
-0.78
POSITIVE LOGITS
ironic
0.91
irony
0.86
posX
0.65
ironically
0.64
paradoxical
0.61
blest
0.58
xPos
0.57
overcrow
0.57
umożli
0.56
hypocritical
0.56
Activations Density 0.125%