INDEX
    Explanations

    phrases indicating admiration or strong liking for something

    phrases indicating fandom or allegiance to various subjects

    New Auto-Interp
    Negative Logits
     accounted
    -0.68
     dispatch
    -0.64
    opard
    -0.64
    COMPLE
    -0.61
    hole
    -0.61
    ItemImage
    -0.59
     cog
    -0.58
    ?]
    -0.58
    BUS
    -0.58
    ural
    -0.57
    POSITIVE LOGITS
    76561
    0.85
     sorts
    0.78
    irlf
    0.77
     ours
    0.69
    mire
    0.68
     liberty
    0.65
    etheless
    0.65
     whichever
    0.64
     hers
    0.64
     yours
    0.62
    Act Density 0.078%

    No Known Activations