INDEX
    Explanations

    celebrities and celebrity

    New Auto-Interp
    Negative Logits
    Skip
    0.46
    olog
    0.40
    abilidade
    0.40
    abilidad
    0.40
    astica
    0.39
    ure
    0.37
     skips
    0.37
    HEP
    0.37
    avia
    0.36
    avía
    0.36
    POSITIVE LOGITS
     celeb
    0.87
     Cele
    0.84
     celebrities
    0.81
     celebrity
    0.80
     celebr
    0.79
    Cele
    0.77
    Celebr
    0.76
     celebs
    0.76
     Celebr
    0.75
    celebr
    0.73
    Act Density 0.002%

    No Known Activations