INDEX
    Explanations

    names of famous pop culture figures or groups

    references to specific musical artists and cultural entities

    New Auto-Interp
    Negative Logits
    itals
    -0.86
    lished
    -0.84
    lessly
    -0.83
    ital
    -0.81
    merce
    -0.81
    itism
    -0.79
    haps
    -0.78
    orously
    -0.77
    ificial
    -0.77
    istically
    -0.76
    POSITIVE LOGITS
     Wings
    1.06
     Hearts
    1.05
     Ducks
    0.98
     Feet
    0.96
     Ones
    0.95
     Bears
    0.92
     Birds
    0.86
     Hands
    0.86
     Bones
    0.86
     Duck
    0.86
    Act Density 0.148%

    No Known Activations