INDEX
Explanations
mentions of locations and names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.15
0.5%
605
+0.12
0.4%
897
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.15
0.06
897
+0.12
0.07
1828
+0.12
0.07
Negative Logits
frastructure
-0.69
tainable
-0.69
tempts
-0.67
sterious
-0.66
bağlantılar
-0.62
complished
-0.62
mentable
-0.61
saites
-0.61
Personendaten
-0.59
quested
-0.59
POSITIVE LOGITS
vété
0.88
spé
0.82
génie
0.81
maneu
0.81
shenan
0.78
mépris
0.77
intitulée
0.76
Souha
0.75
Jusqu
0.75
Kün
0.74
Activations Density 0.600%