INDEX
Explanations
people's names and specific job titles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1843
+0.14
0.4%
1343
+0.14
0.4%
1150
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.14
0.11
1843
+0.14
0.08
227
+0.13
0.09
Negative Logits
himself
-0.85
his
-0.71
himself
-0.69
FetchType
-0.69
חיצוניים
-0.67
होती
-0.66
his
-0.66
His
-0.65
seinen
-0.64
होगी
-0.63
POSITIVE LOGITS
depic
1.71
maneu
1.71
strick
1.67
fta
1.66
shenan
1.64
thut
1.60
inev
1.59
ftu
1.58
aen
1.57
accla
1.57
Activations Density 0.455%