INDEX
Explanations
references to specific locations, such as cities and regions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1861
+0.10
0.3%
1137
+0.08
0.2%
289
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.10
0.06
981
+0.08
0.06
1439
+0.08
0.04
Negative Logits
nakalista
-0.74
lapto
-0.56
sicura
-0.54
affez
-0.53
writeFieldEnd
-0.52
EconPapers
-0.52
֗
-0.51
AppColors
-0.51
))^{-0.51
֔
-0.49
POSITIVE LOGITS
depic
0.90
disagre
0.84
Messieurs
0.81
inev
0.81
fta
0.79
madonna
0.79
fuf
0.79
ftu
0.78
fath
0.78
Mlle
0.77
Activations Density 0.275%