INDEX
    Explanations

    mentions of a specific person's name

    New Auto-Interp
    Negative Logits
    iques
    -0.92
    itudes
    -0.80
    itone
    -0.80
    bre
    -0.76
    reme
    -0.76
     belts
    -0.75
    minus
    -0.74
    sworth
    -0.73
    things
    -0.71
    ique
    -0.71
    POSITIVE LOGITS
    æ°
    0.86
     Cheong
    0.85
    UTC
    0.80
    uthor
    0.78
     Karin
    0.78
    OPLE
    0.77
     raft
    0.77
    éĸ
    0.76
    ADE
    0.75
    士
    0.75
    Act Density 0.022%

    No Known Activations