INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dialect
    -0.07
    years
    -0.06
     █████
    -0.06
     Parts
    -0.06
    -0.06
    -head
    -0.06
    ahr
    -0.06
    -0.06
     Crime
    -0.06
    -0.06
    POSITIVE LOGITS
     Foundation
    0.22
     foundation
    0.18
    Foundation
    0.14
    foundation
    0.12
     foundations
    0.12
     Foundations
    0.10
    oundation
    0.09
    基金
    0.09
    FOUNDATION
    0.08
    وسف
    0.08
    Act Density 0.010%

    No Known Activations