INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     furent
    0.73
    되었다
    0.72
    0.67
     wurden
    0.64
    されている
    0.64
     되었다
    0.64
     furono
    0.62
     byly
    0.60
    变为
    0.60
     Prove
    0.60
    POSITIVE LOGITS
     tried
    0.74
     watched
    0.68
     known
    0.67
     heard
    0.66
     imagined
    0.65
     researched
    0.63
     talked
    0.63
     admired
    0.62
     considered
    0.62
     thought
    0.61
    Act Density 0.133%

    No Known Activations