INDEX
    Explanations

    mentions of specific celebrities, particularly Brad Pitt and Angelina Jolie

    New Auto-Interp
    Negative Logits
    oon
    -0.16
    czy
    -0.16
    ASE
    -0.15
    åľŃ
    -0.15
     liá»ģn
    -0.15
    oons
    -0.15
    emics
    -0.14
    rys
    -0.14
     mate
    -0.13
    pers
    -0.13
    POSITIVE LOGITS
    eref
    0.17
    pedo
    0.16
    bane
    0.16
    ileri
    0.15
    IMIT
    0.15
    ungen
    0.14
    Shift
    0.14
    URE
    0.14
     Shift
    0.14
    acket
    0.14
    Act Density 0.001%

    No Known Activations