INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    pt
    0.55
    T
    0.55
    il
    0.54
    ai
    0.51
    ua
    0.50
    uin
    0.49
    ank
    0.47
    0.47
    uk
    0.46
    خ
    0.46
    POSITIVE LOGITS
     wondered
    1.00
     wondering
    0.91
     wonder
    0.86
     wonders
    0.78
    Wonder
    0.77
     dudas
    0.77
    wonder
    0.77
    🤔
    0.74
     "¿
    0.70
     duda
    0.70
    Act Density 0.037%

    No Known Activations