INDEX
    Explanations

    references to people and their roles or titles in professional contexts

    New Auto-Interp
    Negative Logits
    ”:
    -0.18
    ”,
    -0.18
    ”;
    -0.17
     â̦↵
    -0.17
    }:
    -0.17
    *,
    -0.17
    ”),
    -0.17
    !),
    -0.17
    “,
    -0.17
    **,
    -0.17
    POSITIVE LOGITS
    .
    0.48
    .ï¼ı
    0.19
    .`
    0.18
    .:.:.
    0.18
    .?
    0.18
    ....
    0.17
    pagen
    0.17
    .!
    0.17
    .↵
    0.16
    .à¸ŀ
    0.16
    Act Density 0.022%

    No Known Activations