INDEX
    Explanations

    rhetorical questions or expressions of surprise

    New Auto-Interp
    Negative Logits
    gni
    -0.16
    eniable
    -0.15
    enty
    -0.14
    nio
    -0.14
     anything
    -0.14
    ulty
    -0.14
    mie
    -0.14
    ito
    -0.14
    emonic
    -0.14
    ä¹ħ
    -0.14
    POSITIVE LOGITS
     else
    0.21
     do
    0.20
    aya
    0.19
     did
    0.19
     better
    0.18
     timing
    0.18
    timing
    0.17
    soever
    0.17
    ser
    0.16
    	else
    0.15
    Act Density 0.083%

    No Known Activations