INDEX
    Explanations

    endings or exclamations

    New Auto-Interp
    Negative Logits
    /
    0.70
     used
    0.55
    0.52
     use
    0.52
     uses
    0.52
     using
    0.51
     verwendet
    0.50
     indicates
    0.49
     generally
    0.49
     typically
    0.49
    POSITIVE LOGITS
     вопло
    0.54
    无数
    0.51
    !
    0.47
     그것
    0.46
     jamás
    0.46
    !"
    0.45
     любви
    0.45
     człowie
    0.45
    !..
    0.44
    !”
    0.44
    Act Density 2.088%

    No Known Activations