INDEX
    Explanations

    specific nouns or adjectives

    New Auto-Interp
    Negative Logits
    ihad
    0.53
     Hence
    0.53
     Zudem
    0.49
     كما
    0.48
     Admittedly
    0.48
     Unfortunately
    0.48
     Similarly
    0.47
    ¹
    0.47
     Because
    0.47
    ¹.
    0.47
    POSITIVE LOGITS
     newest
    0.72
     possibility
    0.66
     distinction
    0.63
     quantity
    0.63
     fact
    0.60
     latest
    0.58
     phrase
    0.58
     authenticity
    0.57
     greatest
    0.56
     hyperlink
    0.55
    Act Density 0.000%

    No Known Activations