INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    1.95
    a
    1.90
    ه
    1.88
    ان
    1.84
    an
    1.75
    o
    1.65
    ка
    1.59
    1
    1.58
    e
    1.48
    س
    1.47
    POSITIVE LOGITS
    1.13
    0.96
    。「
    0.95
    古墳
    0.94
    有所
    0.93
    看待
    0.91
    τή
    0.90
    <0x80>
    0.90
    0.89
    r
    0.89
    Act Density 0.009%

    No Known Activations