INDEX
    Explanations

    phrases related to opinions, beliefs, claims, and speculations

    New Auto-Interp
    Negative Logits
    ciating
    -0.85
    ients
    -0.77
    tesy
    -0.70
    ĸļ
    -0.67
    ften
    -0.63
    viron
    -0.62
    ibles
    -0.62
    rals
    -0.60
    Contents
    -0.60
     Himself
    -0.59
    POSITIVE LOGITS
     parallels
    0.77
     errone
    0.76
     incorrectly
    0.72
     similarities
    0.65
     negatively
    0.63
     resemb
    0.63
     doom
    0.63
     why
    0.62
     whether
    0.61
     aloud
    0.61
    Act Density 0.247%

    No Known Activations