INDEX
Explanations
phrases indicating a passage of time or a delay in action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1872
+0.09
0.3%
1323
+0.08
0.2%
1604
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
690
+0.09
0.04
1363
+0.08
0.03
1343
+0.08
0.03
Negative Logits
ButtonModule
-0.67
paddingVertical
-0.65
marginHorizontal
-0.64
}]
-0.60
MatButtonModule
-0.59
DeleteMapping
-0.59
PutMapping
-0.58
alignSelf
-0.57
ba
-0.57
endl
-0.57
POSITIVE LOGITS
awhile
1.81
affor
1.55
unden
1.53
fortn
1.48
viciss
1.42
indestru
1.42
philanth
1.40
stockholm
1.39
wherea
1.39
hoody
1.37
Activations Density 0.306%