INDEX
    Explanations

    symbols or special characters used in formatting or notation

    New Auto-Interp
    Negative Logits
     Peb
    -0.83
     scattering
    -0.72
     princ
    -0.72
    omething
    -0.69
     tremend
    -0.67
     Afric
    -0.66
     nearest
    -0.65
     Manhattan
    -0.64
    nown
    -0.64
     dangling
    -0.64
    POSITIVE LOGITS
    º
    1.37
    ¹
    1.24
    į
    1.17
    Į
    1.16
    £
    1.16
    Ĵ
    1.15
    §
    1.14
    ¬
    1.12
    ı
    1.11
    ij
    1.08
    Act Density 0.154%

    No Known Activations