INDEX
    Explanations

    names or mentions of a specific person

    variations of the word "take."

    New Auto-Interp
    Negative Logits
    ãĥ£
    -0.74
    enegger
    -0.69
    oad
    -0.67
     Weasley
    -0.63
    ãĤ¡
    -0.60
    cfg
    -0.59
    aldi
    -0.59
    fired
    -0.57
    inatory
    -0.56
    swick
    -0.55
    POSITIVE LOGITS
    Maker
    0.81
    warm
    0.80
    akes
    0.78
    ñ
    0.78
    aimon
    0.77
    asy
    0.73
    yon
    0.73
    velop
    0.73
    yi
    0.72
    maker
    0.71
    Act Density 0.033%

    No Known Activations