INDEX
    Explanations

    phrases related to unity or bringing things together

    phrases related to reasons or justifications

    New Auto-Interp
    Negative Logits
    Else
    -0.66
    oros
    -0.65
    upon
    -0.63
    ican
    -0.63
    ior
    -0.62
    iod
    -0.61
     Afterwards
    -0.61
    paren
    -0.61
    iov
    -0.60
    ington
    -0.60
    POSITIVE LOGITS
     ones
    1.10
     overarching
    1.04
     none
    1.00
     simplest
    0.98
     nutshell
    0.98
     one
    0.98
     favorites
    0.95
     particular
    0.87
     suffice
    0.87
     predominant
    0.85
    Act Density 0.399%

    No Known Activations