INDEX
    Explanations

    items separated by commas or 'or'

    New Auto-Interp
    Negative Logits
     decree
    0.47
     position
    0.45
     form
    0.44
     getter
    0.44
     law
    0.43
     world
    0.42
    ders
    0.42
     comprim
    0.42
     deci
    0.42
     process
    0.41
    POSITIVE LOGITS
     “‘
    0.94
    0.91
     %``
    0.86
    0.78
    0.77
     "'
    0.77
    0.75
     `
    0.74
     "
    0.73
     "¿
    0.72
    Act Density 0.366%

    No Known Activations