INDEX
    Explanations

    phrases related to contrasting or specifying different categories or options

    references to relationships and social connections

    New Auto-Interp
    Negative Logits
    cki
    -0.67
    rave
    -0.67
    WF
    -0.57
    haw
    -0.56
    KI
    -0.55
    itute
    -0.55
    WD
    -0.55
    ady
    -0.55
    Skip
    -0.53
    RM
    -0.52
    POSITIVE LOGITS
     etc
    1.28
    etc
    1.15
    whatever
    0.95
    ĪĴ
    0.82
     blah
    0.80
     whatever
    0.76
     respectively
    0.75
     Allah
    0.74
     whichever
    0.74
    =-=-=-=-
    0.71
    Act Density 0.397%

    No Known Activations