INDEX
    Explanations

    instances of encouragement or supportive language

    New Auto-Interp
    Negative Logits
     بيها
    -0.84
    id
    -0.68
    Bae
    -0.66
     Bae
    -0.66
    en
    -0.66
    println
    -0.64
    machte
    -0.64
    quad
    -0.63
     Koz
    -0.62
     Фа
    -0.62
    POSITIVE LOGITS
    couragement
    1.15
    couraged
    1.14
     discouraged
    1.05
    encouragement
    1.02
    couraging
    0.98
     discourage
    0.95
     encouragement
    0.94
    multer
    0.93
     encouraged
    0.91
     Encourage
    0.88
    Act Density 0.008%

    No Known Activations