INDEX
    Explanations

    words related to astonishment or surprise

    New Auto-Interp
    Negative Logits
    terday
    -0.80
    istance
    -0.72
    ibel
    -0.72
    andre
    -0.69
    pai
    -0.69
    idences
    -0.68
    uctor
    -0.66
    hip
    -0.66
    itism
    -0.66
    idential
    -0.65
    POSITIVE LOGITS
    gers
    0.86
    warts
    0.84
    matic
    0.81
    asus
    0.79
    weed
    0.79
    ga
    0.78
    ues
    0.76
    arty
    0.74
    Champ
    0.74
    ogg
    0.73
    Act Density 0.028%

    No Known Activations