INDEX
    Explanations

    contextual references to theoretical analysis and models

    New Auto-Interp
    Negative Logits
     myſelf
    -0.96
     indígen
    -0.94
     desmotivaciones
    -0.93
     increí
    -0.93
     ſta
    -0.90
     avoient
    -0.90
     faſt
    -0.87
    ſelf
    -0.86
    ientras
    -0.85
     itſelf
    -0.85
    POSITIVE LOGITS
    0.67
    0.61
    '
    0.60
     of
    0.60
    ,
    0.57
    "
    0.56
     the
    0.56
    .
    0.53
    0
    0.51
     other
    0.50
    Act Density 0.548%

    No Known Activations