INDEX
    Explanations

    phrases that express uncertainty or questioning

    New Auto-Interp
    Negative Logits
    طب
    -0.16
    TintColor
    -0.15
    zed
    -0.14
    -selector
    -0.14
    elon
    -0.14
    SSI
    -0.14
    ecure
    -0.14
    annes
    -0.14
    jÃŃž
    -0.14
    rement
    -0.14
    POSITIVE LOGITS
     tell
    0.79
     Tell
    0.70
     telling
    0.68
    tell
    0.66
     tells
    0.66
    Tell
    0.61
     Tells
    0.53
     told
    0.52
    åijĬè¯ī
    0.47
    .tell
    0.44
    Act Density 0.083%

    No Known Activations