INDEX
Explanations
adverbs of degree or emphasis
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.11
0.3%
1778
+0.09
0.3%
1473
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.11
0.05
1194
+0.09
0.03
1052
+0.09
0.04
Negative Logits
Mejía
-0.77
Áng
-0.69
Ávila
-0.68
Méndez
-0.66
Cuer
-0.66
Cuen
-0.65
Bahía
-0.65
Cár
-0.62
Hierro
-0.62
Almería
-0.62
POSITIVE LOGITS
faggot
0.65
sherpa
0.62
ugg
0.62
AppColors
0.61
stit
0.61
volete
0.61
intermitt
0.60
disreg
0.59
legging
0.58
vogli
0.57
Activations Density 0.090%