INDEX
    Explanations

    level of detail comparison

    New Auto-Interp
    Negative Logits
     Ech
    0.43
     θεω
    0.38
    utérus
    0.37
     ลักษณะ
    0.37
    ลักษณะ
    0.37
     Travers
    0.37
    '}).
    0.36
    ix
    0.36
    heiten
    0.36
    𝑃
    0.36
    POSITIVE LOGITS
     downright
    0.91
     outright
    0.85
     zelfs
    0.80
     addirittura
    0.77
    甚至
    0.76
     sogar
    0.75
     incluso
    0.73
     bahkan
    0.71
     thậm
    0.68
     even
    0.66
    Act Density 0.153%

    No Known Activations