INDEX
    Explanations

    phrases indicating nearly the same or very similar items, circumstances, or actions

    phrases and words emphasizing frequency or recurrence

    New Auto-Interp
    Negative Logits
     Louie
    -0.84
     Kirby
    -0.65
     Bard
    -0.64
     Ki
    -0.62
     Colors
    -0.62
    andise
    -0.58
    gio
    -0.57
     Blend
    -0.57
     Jord
    -0.57
    ":["
    -0.56
    POSITIVE LOGITS
    rontal
    0.73
    lyak
    0.72
    èª
    0.71
    osher
    0.68
    haust
    0.66
     spoiler
    0.65
    ï¸
    0.65
    maxwell
    0.64
    Ö
    0.64
    Ïī
    0.63
    Act Density 0.188%

    No Known Activations