INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Note
    0.66
    NOTE
    0.66
     note
    0.64
    note
    0.63
     ***",
    0.56
     NOTE
    0.52
     **,
    0.51
    Notas
    0.49
    Nota
    0.48
    Примітки
    0.46
    POSITIVE LOGITS
    ↵↵↵↵
    1.16
    ↵↵↵
    1.14
    ↵↵↵↵↵
    1.00
    ↵↵↵↵↵↵
    0.91
    ↵↵↵↵↵↵↵
    0.88
     Especially
    0.77
    ↵↵↵↵↵↵↵↵
    0.76
    ↵↵↵↵↵↵↵↵↵
    0.75
    ↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.71
     :)
    0.69
    Act Density 0.574%

    No Known Activations