INDEX
    Explanations

    pronouns or possessive words indicating relationships between different entities

    references to relationships and interactions between people

    New Auto-Interp
    Negative Logits
    aneously
    -0.70
    stals
    -0.68
    ctors
    -0.67
    unts
    -0.63
    stein
    -0.62
    monds
    -0.60
    aneous
    -0.60
     172
    -0.58
    mons
    -0.57
    ships
    -0.57
    POSITIVE LOGITS
    hip
    0.97
    heet
    0.96
    hare
    0.93
    etter
    0.93
    etting
    0.91
    mith
    0.91
    ilver
    0.90
    cape
    0.81
    peed
    0.80
    erver
    0.79
    Act Density 0.294%

    No Known Activations