INDEX
Explanations
references to perspectives and observations on systemic issues
New Auto-Interp
Negative Logits
essen
-0.19
ativ
-0.16
ffd
-0.15
umat
-0.15
bage
-0.15
lich
-0.15
ureka
-0.15
olio
-0.14
inals
-0.14
als
-0.14
POSITIVE LOGITS
merely
0.28
simply
0.20
åıªæĺ¯
0.18
-selector
0.17
à¹ģà¸Ħ
0.17
instead
0.17
nor
0.16
juste
0.16
just
0.16
nor
0.16
Activations Density 0.097%