INDEX
    Explanations

    the presence of specific character sequences that correspond to names or identifiers

    New Auto-Interp
    Negative Logits
    ovid
    -0.15
    pez
    -0.15
    .ov
    -0.15
    akis
    -0.15
    евиÑĩ
    -0.14
    ongan
    -0.14
    preh
    -0.14
    etr
    -0.14
    ÏĢλα
    -0.14
    achs
    -0.13
    POSITIVE LOGITS
    al
    0.17
     rent
    0.16
     Princip
    0.16
    xis
    0.15
    anus
    0.15
    scaled
    0.15
    alah
    0.14
    Bold
    0.14
    ulia
    0.14
    åĢį
    0.14
    Act Density 0.003%

    No Known Activations