INDEX
    Explanations

    mentions of personal identity and name references

    the speaker or author

    New Auto-Interp
    Negative Logits
    ſſung
    -0.77
    featureID
    -0.77
    aarrggbb
    -0.69
     ſeveral
    -0.68
     ſol
    -0.68
     utafitiHapana
    -0.68
     ſta
    -0.68
    ðsíða
    -0.67
     ſei
    -0.67
     ſoll
    -0.65
    POSITIVE LOGITS
     my
    0.38
     meinen
    0.35
     me
    0.32
    staw
    0.30
     myself
    0.30
    0.29
    我的
    0.28
    Teilen
    0.28
    #!/
    0.27
     miei
    0.26
    Act Density 0.352%

    No Known Activations