INDEX
Explanations
text related to personal experiences and storytelling
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1252
+0.12
0.3%
1728
+0.09
0.3%
1042
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1252
+0.12
0.04
1510
+0.09
0.03
1728
+0.09
0.03
Negative Logits
stili
-0.73
iyon
-0.72
araw
-0.71
alfabe
-0.66
bago
-0.66
OMITTED
-0.65
Ouverture
-0.65
makro
-0.64
comuna
-0.64
Kategor
-0.64
POSITIVE LOGITS
unspeak
0.96
Wtf
0.92
McLaugh
0.87
apprehen
0.86
Lmao
0.83
gaily
0.82
unlaw
0.79
Souha
0.79
shenan
0.78
friends
0.78
Activations Density 0.199%