INDEX
    Explanations

    strings or sequences related to coding or programming constructs

    New Auto-Interp
    Negative Logits
     myſelf
    -0.88
    expandindo
    -0.88
     himſelf
    -0.87
    sizeCache
    -0.86
     chofe
    -0.86
     Spisak
    -0.85
    ſelf
    -0.84
    HasForeignKey
    -0.84
     ISNI
    -0.83
     raiſ
    -0.82
    POSITIVE LOGITS
    [toxicity=0]
    0.52
    </strong>
    0.48
    0.47
    ństwa
    0.45
    ...
    0.45
    orghe
    0.44
    falen
    0.43
     -
    0.43
     otherwise
    0.42
    </b>
    0.42
    Act Density 0.433%

    No Known Activations