INDEX
    Explanations

    references to stars or celebrities

    instances of the word "star"

    New Auto-Interp
    Negative Logits
    apons
    -1.01
    »Ĵ
    -1.00
    nsic
    -0.95
    odcast
    -0.92
    veyard
    -0.88
    berra
    -0.88
    ibaba
    -0.85
    Downloadha
    -0.85
    ĵĺ
    -0.85
    iblings
    -0.84
    POSITIVE LOGITS
     star
    1.13
     stars
    1.12
    star
    0.92
    stars
    0.92
    light
    0.85
     attraction
    0.84
    liner
    0.83
    lit
    0.78
    burst
    0.75
    lite
    0.75
    Act Density 0.011%

    No Known Activations