INDEX
    Explanations

    mentions of roles, titles, and positions of individuals

    New Auto-Interp
    Negative Logits
    hill
    -0.16
    åIJ
    -0.15
    jian
    -0.15
    stav
    -0.14
    بÙĪØ±
    -0.14
    ,↵↵↵↵
    -0.13
    pent
    -0.13
    .html
    -0.13
    Pure
    -0.13
    agar
    -0.13
    POSITIVE LOGITS
    ibase
    0.20
    ffe
    0.17
    ekl
    0.15
    716
    0.15
     hosts
    0.15
     STRICT
    0.15
    regunta
    0.15
    å¹¹ç·ļ
    0.15
     дÑĥÑĪ
    0.14
    髪
    0.14
    Act Density 0.146%

    No Known Activations