INDEX
    Explanations

    and followed by a new clause

    New Auto-Interp
    Negative Logits
     /
    0.38
     :(
    0.38
    😐
    0.37
    0.37
    Wide
    0.37
     >
    0.36
     '
    0.36
     !
    0.36
     !}
    0.36
    𝑚
    0.35
    POSITIVE LOGITS
     lastly
    0.86
    ंगाबाद
    0.71
     yes
    0.66
    Lastly
    0.65
     oczywiście
    0.64
     furthermore
    0.61
    最後に
    0.61
     Lastly
    0.56
     incidentally
    0.56
     btw
    0.56
    Act Density 0.005%

    No Known Activations