INDEX
    Explanations

    proper nouns or names

    mentions of specific individuals or names

    New Auto-Interp
    Negative Logits
    town
    -0.66
    ongyang
    -0.64
    Ģ
    -0.63
    arms
    -0.60
     pride
    -0.60
    HAEL
    -0.59
     Kinnikuman
    -0.59
    ridges
    -0.58
     cheeks
    -0.57
    orie
    -0.57
    POSITIVE LOGITS
    ited
    0.93
    vironment
    0.93
    cend
    0.91
    thal
    0.90
    ction
    0.89
    swer
    0.83
    issance
    0.83
    ity
    0.82
    emies
    0.82
    cing
    0.81
    Act Density 0.025%

    No Known Activations