INDEX
    Explanations

    expressions of surprise or curiosity

    New Auto-Interp
    Negative Logits
     
    -0.16
    iev
    -0.15
     ride
    -0.15
    icontrol
    -0.15
    134
    -0.15
     dirt
    -0.14
    YA
    -0.14
    -upload
    -0.14
    istics
    -0.14
     null
    -0.14
    POSITIVE LOGITS
    IGHL
    0.16
    اÙĥÙĨ
    0.15
    ampie
    0.15
    ñana
    0.15
    sembling
    0.14
    etting
    0.14
     zeigen
    0.13
    _caption
    0.13
    atik
    0.13
    .adv
    0.13
    Act Density 0.013%

    No Known Activations