INDEX
Explanations
mentions of the word "Santa"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
597
+0.14
0.6%
1276
+0.14
0.6%
1896
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.14
0.02
1472
+0.14
0.02
597
+0.13
0.02
Negative Logits
cushi
-0.65
pewter
-0.60
Joaqu
-0.58
apprehen
-0.54
callBack
-0.53
errorMsg
-0.52
STRUCTIONS
-0.52
Lázaro
-0.51
userType
-0.51
loveliness
-0.51
POSITIVE LOGITS
Santa
1.54
Santa
1.52
santa
1.45
santa
1.31
SANTA
1.30
SANTA
1.21
Claus
0.82
Sante
0.81
Chapitre
0.75
akade
0.74
Activations Density 0.071%