INDEX
Explanations
instances where the text talks about specific quantitative amounts or comparisons
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
897
+0.12
0.4%
849
+0.12
0.4%
662
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
849
+0.12
0.02
897
+0.12
0.02
662
+0.11
0.02
Negative Logits
OnDelete
-0.43
Feind
-0.43
Tenure
-0.42
Cou
-0.41
Rng
-0.41
pymongo
-0.40
Montag
-0.40
$+$
-0.40
Heber
-0.39
+}\
-0.39
POSITIVE LOGITS
barely
0.71
hardly
0.70
autorytatywna
0.70
haast
0.70
Hardly
0.69
nutella
0.66
scarcely
0.65
Життєпис
0.62
Bárbara
0.60
Біографія
0.58
Activations Density 0.083%