INDEX
    Explanations

    mentions of specific values or ratings

    instances of a specific character or symbol

    New Auto-Interp
    Negative Logits
    raints
    -0.84
     Seym
    -0.83
     mathemat
    -0.75
     disadvant
    -0.74
     misunder
    -0.74
     trainers
    -0.72
     Enlightenment
    -0.70
     condem
    -0.70
     pestic
    -0.69
    enegger
    -0.69
    POSITIVE LOGITS
    ï¸ı
    1.29
    lean
    1.00
    log
    0.91
    ï¸
    0.86
    ĺ
    0.86
    ģ
    0.82
    £
    0.82
    rd
    0.81
    deg
    0.81
    Ģ
    0.80
    Act Density 0.034%

    No Known Activations