INDEX
    Explanations

    references to the actor Tom Cruise

    New Auto-Interp
    Negative Logits
    ĥ½
    -1.94
    rapeut
    -1.69
    ¼
    -1.60
     death
    -1.58
    ulls
    -1.54
    ĻĤ
    -1.53
    plasia
    -1.51
     male
    -1.50
    mes
    -1.46
    ij
    -1.45
    POSITIVE LOGITS
    ulence
    1.50
    DOM
    1.45
    antry
    1.45
    dale
    1.43
    esp
    1.43
     Driver
    1.43
    als
    1.41
    ello
    1.40
     quotes
    1.40
    blogger
    1.39
    Act Density 0.017%

    No Known Activations