INDEX
    Explanations

    occurrences of specific non-standard characters or symbols

    New Auto-Interp
    Negative Logits
    bbe
    -0.15
    ietet
    -0.14
    ifi
    -0.14
     cap
    -0.14
    alice
    -0.14
     RAW
    -0.14
    onta
    -0.14
    ourt
    -0.14
    roman
    -0.14
    ytt
    -0.13
    POSITIVE LOGITS
    ÏĮÏĤ
    0.15
     Semantic
    0.15
    enticator
    0.14
    peater
    0.14
    itar
    0.14
    åŃ
    0.14
    ÑĤий
    0.14
     nid
    0.14
    ictory
    0.14
    phere
    0.14
    Act Density 0.004%

    No Known Activations