INDEX
    Explanations

    phrases related to categorization or classification based on characteristics

    phrasings that involve the concept of "sort of" in relation to various topics

    New Auto-Interp
    Negative Logits
    ajor
    -0.83
    ĸļ
    -0.79
    ļéĨĴ
    -0.69
    hens
    -0.67
     Zup
    -0.67
    enes
    -0.66
    tec
    -0.65
    hend
    -0.65
    orest
    -0.65
    Cub
    -0.64
    POSITIVE LOGITS
     thing
    0.91
     luck
    0.79
     stuff
    0.71
     fun
    0.69
     nerve
    0.69
    catentry
    0.68
     crap
    0.68
     nonsense
    0.68
     humility
    0.66
     things
    0.66
    Act Density 0.040%

    No Known Activations