INDEX
    Explanations

    pronouns and references to personal identity

    New Auto-Interp
    Negative Logits
    åµ
    -0.16
    inand
    -0.15
    æ¡Ĥ
    -0.15
    itag
    -0.15
     Justin
    -0.15
    ugin
    -0.15
    åİ»äºĨ
    -0.15
     Bowen
    -0.15
     away
    -0.15
     Vin
    -0.15
    POSITIVE LOGITS
    381
    0.16
    947
    0.16
    oise
    0.16
     Beg
    0.16
    387
    0.16
    eon
    0.15
    heits
    0.15
    OLDER
    0.15
    Interceptor
    0.14
    zsche
    0.14
    Act Density 0.010%

    No Known Activations