INDEX
    Explanations

    words with negation prefixes

    New Auto-Interp
    Negative Logits
    ãĤ¼ãĤ¦ãĤ¹
    -0.66
    imeter
    -0.64
     showc
    -0.64
    urrency
    -0.61
     uninterrupted
    -0.61
    Ĥİ
    -0.60
    imeters
    -0.59
    Ͻ
    -0.58
     Lines
    -0.57
    ħĭ
    -0.57
    POSITIVE LOGITS
    withstanding
    0.94
    quite
    0.80
    yet
    0.78
    know
    0.72
    Quite
    0.71
    icable
    0.70
    necess
    0.70
    nice
    0.69
    Notice
    0.68
    onso
    0.67
    Act Density 0.066%

    No Known Activations