INDEX
    Explanations

    phrases indicating agreement or alignment

    statements and assertions about equivalence or similarity across different subjects or contexts

    New Auto-Interp
    Negative Logits
    zos
    -0.73
    ded
    -0.60
    watch
    -0.59
    ichen
    -0.55
    urat
    -0.54
     interrupts
    -0.54
    ffff
    -0.54
    stru
    -0.53
    agram
    -0.52
    NetMessage
    -0.52
    POSITIVE LOGITS
    ï¸ı
    0.83
     Nationwide
    0.71
     everywhere
    0.66
    unity
    0.64
    pn
    0.63
    ivism
    0.62
    oat
    0.62
     applies
    0.61
    sburgh
    0.61
    blance
    0.61
    Act Density 0.188%

    No Known Activations