INDEX
    Explanations

    positive comments or remarks

    instances of a specific character or symbol repeated throughout the text

    New Auto-Interp
    Negative Logits
     vulner
    -0.96
     disadvant
    -0.89
     mathemat
    -0.82
     princ
    -0.82
     accomp
    -0.76
     constitu
    -0.76
     fundament
    -0.76
     traged
    -0.74
     advis
    -0.73
     sacrific
    -0.73
    POSITIVE LOGITS
    ï¸ı
    1.12
    ï¸
    0.92
    à¥
    0.82
    RW
    0.81
    âĶĢâĶĢ
    0.79
    ı
    0.79
    æľ
    0.79
    \":
    0.78
    lime
    0.77
    ãĥ´ãĤ¡
    0.75
    Act Density 0.257%

    No Known Activations