INDEX
Explanations
mentions of full-time work
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
674
+0.10
0.3%
168
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
404
+0.10
0.02
1492
+0.10
0.02
1081
+0.10
0.02
Negative Logits
increa
-1.54
effe
-1.43
thut
-1.42
inev
-1.40
nece
-1.39
unden
-1.39
accla
-1.35
volunte
-1.34
impra
-1.32
affor
-1.31
POSITIVE LOGITS
<bos>
0.95
time
0.73
ruly
0.72
mistak
0.71
time
0.70
sightly
0.66
relenting
0.65
OGND
0.65
Time
0.64
yelitis
0.63
Activations Density 0.091%