INDEX
    Explanations

    phrases related to interventions or decision-making

    instances of special characters or symbols in the text

    New Auto-Interp
    Negative Logits
     Brist
    -0.73
    enegger
    -0.72
     Thomson
    -0.70
     airs
    -0.67
    itsch
    -0.65
     Manhattan
    -0.65
     Tob
    -0.64
     Strat
    -0.64
     Shap
    -0.63
     Borough
    -0.63
    POSITIVE LOGITS
    ¬
    1.60
    Ļ
    1.49
    ı
    1.23
    ĸ
    1.21
    ª
    1.17
    ¾
    1.16
    ļ
    1.15
    ¡
    1.15
    ħ
    1.14
    ľ
    1.13
    Act Density 0.384%

    No Known Activations