INDEX
    Explanations

    1 followed by numbers or punctuation

    New Auto-Interp
    Negative Logits
    ak
    0.68
    is
    0.64
    u
    0.64
    1
    0.63
    2
    0.63
    ik
    0.59
    おります
    0.58
    5
    0.58
    ok
    0.57
    9
    0.57
    POSITIVE LOGITS
     unimagin
    0.54
     ardent
    0.52
    s
    0.50
     mittler
    0.48
     hermit
    0.46
     côté
    0.45
    0.45
     lovable
    0.45
     avvic
    0.45
     hơn
    0.45
    Act Density 0.641%

    No Known Activations