INDEX
Explanations
interrogative sentences related to specific events or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.18
0.6%
50
+0.14
0.4%
1967
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1141
+0.18
0.04
981
+0.14
0.03
1373
+0.13
0.01
Negative Logits
reluct
-1.76
shenan
-1.72
scrat
-1.71
increa
-1.70
impra
-1.69
disagre
-1.65
maneu
-1.65
disreg
-1.63
tolerably
-1.62
cushi
-1.59
POSITIVE LOGITS
Literat
0.95
minimalis
0.94
höl
0.92
Horário
0.91
Dès
0.80
Etimo
0.80
Composição
0.80
Pró
0.79
kosme
0.79
Até
0.77
Activations Density 0.060%