INDEX
Explanations
frustrating situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.14
0.4%
1143
+0.08
0.2%
1749
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1030
+0.14
0.03
1154
+0.08
0.05
1477
+0.07
0.06
Negative Logits
thermomix
-0.94
milano
-0.89
purcha
-0.86
increa
-0.86
verona
-0.82
suscep
-0.82
sofia
-0.81
ibiza
-0.81
oleo
-0.81
embodi
-0.80
POSITIVE LOGITS
disappointment
0.84
disappointed
0.80
disappointing
0.75
news
0.64
setback
0.62
disappoint
0.60
defeat
0.60
loss
0.54
kasarigan
0.53
sad
0.53
Activations Density 0.630%