INDEX
    Explanations

    phrases indicating strong opinions or beliefs

    New Auto-Interp
    Negative Logits
    olis
    -0.84
    UES
    -0.74
    uctions
    -0.74
    oute
    -0.73
    iens
    -0.72
    ischer
    -0.71
    ULTS
    -0.68
    olen
    -0.67
    ahime
    -0.67
    ãĥĩãĤ£
    -0.66
    POSITIVE LOGITS
     whatsoever
    1.03
     respecting
    0.84
     how
    0.81
    llor
    0.78
     whatever
    0.78
     whether
    0.77
    theless
    0.74
    ¬¼
    0.69
    é¾įå¥ij士
    0.68
    ileged
    0.68
    Act Density 0.009%

    No Known Activations