INDEX
    Explanations

    mentions of interest or being interested in something

    expressions of curiosity or engagement towards various topics or activities

    New Auto-Interp
    Negative Logits
    Fail
    -0.65
    UES
    -0.63
     misunder
    -0.62
    âĶĢ
    -0.62
     patriarch
    -0.61
     welf
    -0.60
    unts
    -0.59
     stacked
    -0.59
    llan
    -0.57
    ãĥ¼ãĥĨãĤ£
    -0.57
    POSITIVE LOGITS
    ãĥĦ
    0.77
     enough
    0.74
     therein
    0.74
    iltr
    0.71
    inery
    0.71
    ately
    0.70
     in
    0.67
    igent
    0.66
    illed
    0.65
    iotics
    0.64
    Act Density 0.037%

    No Known Activations