INDEX
Explanations
phrases indicating a point in time or a condition that needs to be met
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.10
0.3%
314
+0.10
0.3%
752
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.10
0.04
1993
+0.10
0.05
849
+0.09
0.04
Negative Logits
FTFY
-0.81
disreg
-0.77
YMMV
-0.69
impra
-0.69
Noice
-0.68
unspeak
-0.66
subgoal
-0.65
ftw
-0.65
prolly
-0.64
indescri
-0.64
POSITIVE LOGITS
masaj
0.82
susun
0.81
umo
0.76
CiNii
0.75
thuy
0.75
lele
0.73
granada
0.72
mariscos
0.69
uhr
0.69
nuoc
0.69
Activations Density 0.320%