INDEX
    Explanations

    references to significant historical events or societal changes

    New Auto-Interp
    Negative Logits
     complies
    -0.45
    avoid
    -0.45
     avoid
    -0.41
     remain
    -0.41
     avoiding
    -0.40
     preven
    -0.39
     Avoiding
    -0.39
     remained
    -0.38
     await
    -0.38
     избе
    -0.37
    POSITIVE LOGITS
     trouxe
    0.69
     mengubah
    0.62
     bring
    0.59
     brings
    0.59
     bringing
    0.58
     brought
    0.57
    Bring
    0.57
     Bring
    0.56
     trajo
    0.55
     exposing
    0.54
    Act Density 0.119%

    No Known Activations