INDEX
    Explanations

    the word "os" and variations of it

    New Auto-Interp
    Negative Logits
    ãĤ¡
    -0.71
    rence
    -0.66
    bler
    -0.65
    taker
    -0.64
    OWS
    -0.63
    ASED
    -0.63
    BRE
    -0.62
    OUT
    -0.61
    ufact
    -0.60
    TextColor
    -0.59
    POSITIVE LOGITS
    hiba
    1.46
    heet
    1.24
    ophical
    1.19
    keleton
    1.11
    leep
    1.07
    omething
    1.06
    aurus
    1.06
    opher
    1.05
    ocial
    1.04
    mith
    1.04
    Act Density 0.040%

    No Known Activations