INDEX
Explanations
expressions related to future events or occurrences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.16
0.9%
351
+0.13
0.8%
72
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
72
+0.16
0.03
351
+0.13
0.03
21
+0.13
0.02
Negative Logits
ounding
-1.69
ounded
-1.65
hire
-1.57
rage
-1.56
herself
-1.52
ERY
-1.50
ulk
-1.45
ride
-1.45
improvement
-1.44
College
-1.41
POSITIVE LOGITS
stance
2.45
stances
1.86
inct
1.57
ures
1.56
malf
1.48
conj
1.47
ipes
1.47
events
1.43
orph
1.42
things
1.41
Activations Density 0.013%