INDEX
Explanations
words expressing comparisons and newness
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
225
+0.07
0.2%
1724
+0.07
0.2%
1462
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1677
+0.07
0.03
1025
+0.07
0.03
2031
+0.07
0.03
Negative Logits
Augu
-0.86
bangkok
-0.81
stockholm
-0.79
venice
-0.77
Gorb
-0.77
Riti
-0.76
lidl
-0.76
fta
-0.75
linden
-0.74
Idem
-0.74
POSITIVE LOGITS
rarely
0.67
never
0.65
hadn
0.65
seldom
0.62
lately
0.61
recently
0.56
until
0.55
never
0.53
hasn
0.52
haven
0.51
Activations Density 0.232%