INDEX
    Explanations

    words related to expressing thoughts and opinions

    expressions of personal agency and emotional experiences

    New Auto-Interp
    Negative Logits
     themselves
    -0.67
     apiece
    -0.64
     respectively
    -0.63
    idates
    -0.60
    Their
    -0.57
     tariffs
    -0.51
     turnover
    -0.51
     Us
    -0.50
    idges
    -0.49
     populous
    -0.48
    POSITIVE LOGITS
     myself
    1.88
     my
    1.39
    My
    0.89
     MY
    0.86
    my
    0.83
     My
    0.83
     blogging
    0.74
     mine
    0.74
     am
    0.70
     I
    0.67
    Act Density 1.211%

    No Known Activations