INDEX
    Explanations

    phrases expressing desires or intentions

    expressions of desire or preference

    New Auto-Interp
    Negative Logits
    ulty
    -0.63
    idious
    -0.62
    ccording
    -0.61
    onut
    -0.60
    illian
    -0.56
    oshenko
    -0.55
    abal
    -0.55
    asse
    -0.55
    eteria
    -0.54
    iling
    -0.54
    POSITIVE LOGITS
     to
    1.05
     assurances
    0.77
     thereto
    0.75
     clarification
    0.74
     nothing
    0.70
    to
    0.66
     unto
    0.65
     ta
    0.64
     someone
    0.63
    HT
    0.61
    Act Density 0.044%

    No Known Activations