INDEX
    Explanations

    phrases emphasizing a specific belief or understanding

    New Auto-Interp
    Negative Logits
    backer
    -0.76
    hens
    -0.73
    arthed
    -0.73
    adle
    -0.72
    aukee
    -0.70
    guard
    -0.68
    swick
    -0.67
    ensed
    -0.66
    eng
    -0.66
    ante
    -0.66
    POSITIVE LOGITS
     somehow
    0.85
     they
    0.80
     someday
    0.76
     THEY
    0.73
     unless
    0.73
     justifies
    0.71
     anyone
    0.71
     everything
    0.70
     everyone
    0.70
     there
    0.69
    Act Density 0.170%

    No Known Activations