INDEX
Explanations
the superlative form of adjectives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
81
+0.15
0.9%
409
+0.12
0.7%
274
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
409
+0.15
0.01
81
+0.12
0.01
237
+0.11
0.01
Negative Logits
Ĭ
-1.83
±
-1.79
¦
-1.78
TY
-1.75
§
-1.71
®
-1.70
º
-1.69
©
-1.59
't
-1.57
·¸
-1.51
POSITIVE LOGITS
aceae
1.76
ière
1.65
quart
1.59
ager
1.57
ÅĪ
1.50
gren
1.49
econó
1.47
imagin
1.43
Americans
1.42
iberal
1.42
Activations Density 1.590%