INDEX
Explanations
phrases related to time, specifically "now", "again", and various mentions of weeks
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.18
0.5%
1129
+0.11
0.3%
1013
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1129
+0.18
0.06
1781
+0.11
0.05
1372
+0.10
0.05
Negative Logits
Vegeu
-0.72
Avez
-0.67
iria
-0.65
Faites
-0.65
viewDid
-0.65
Quien
-0.64
Misión
-0.64
niño
-0.63
Nossa
-0.63
Біо
-0.62
POSITIVE LOGITS
pamph
0.98
indestru
0.95
disagre
0.88
reluct
0.88
cuck
0.85
snapback
0.84
caprice
0.84
accla
0.84
madonna
0.84
disgra
0.84
Activations Density 0.211%