INDEX
    Explanations

    phrases indicating knowledge or awareness

    phrases asserting common knowledge or consensus

    New Auto-Interp
    Negative Logits
    pex
    -0.88
    ermanent
    -0.78
    erva
    -0.76
    osi
    -0.73
    cific
    -0.73
    rentice
    -0.73
    onial
    -0.71
    ĪĴ
    -0.71
    oshenko
    -0.71
    cohol
    -0.70
    POSITIVE LOGITS
    ledge
    0.89
    ledged
    0.87
     how
    0.84
     beforehand
    0.74
    lege
    0.71
    ariat
    0.71
     nothing
    0.69
    л
    0.68
    nothing
    0.67
     why
    0.66
    Act Density 0.076%

    No Known Activations