INDEX
    Explanations

    mentions of personal identities or roles

    New Auto-Interp
    Negative Logits
    _formatted
    -0.15
    ks
    -0.15
    amik
    -0.15
    .Batch
    -0.14
    %S
    -0.14
    ãĥĥãĤ·ãĥ¥
    -0.14
    æ³ķ人
    -0.14
    ppv
    -0.14
     skl
    -0.14
    urope
    -0.13
    POSITIVE LOGITS
    HN
    0.17
    otech
    0.15
    hn
    0.15
    -widgets
    0.14
     capitals
    0.14
    æģµ
    0.14
       
    0.14
    oth
    0.14
    åľŃ
    0.14
    isphere
    0.14
    Act Density 0.002%

    No Known Activations