INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Kar
    -0.75
     kar
    -0.71
     Ma
    -0.71
     ma
    -0.63
     Anna
    -0.59
    s
    -0.57
     Tra
    -0.53
     Harry
    -0.53
    SA
    -0.52
    Kar
    -0.52
    POSITIVE LOGITS
     Efq
    0.91
    ſelves
    0.85
     iſt
    0.84
     Monfieur
    0.81
     myſelf
    0.79
    ſelf
    0.77
    出版年
    0.75
     itſelf
    0.75
     étoit
    0.74
     stiefel
    0.74
    Act Density 0.055%

    No Known Activations