INDEX
Explanations
references to gifts and gratitude
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.09
0.3%
1013
+0.08
0.2%
211
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
211
+0.09
0.02
584
+0.08
0.03
319
+0.08
0.04
Negative Logits
Février
-0.85
sappi
-0.80
Giugno
-0.79
Sinal
-0.79
Décembre
-0.79
Luglio
-0.79
RSSSF
-0.78
soigne
-0.77
Ottobre
-0.76
exé
-0.76
POSITIVE LOGITS
intrigued
0.75
curiosity
0.72
unfamiliar
0.63
blurb
0.63
intrigues
0.60
lured
0.58
scepticism
0.57
curious
0.57
myself
0.57
advertised
0.55
Activations Density 0.607%