INDEX
    Explanations

    words related to neutrality or lack of bias

    content related to neutrality and its various implications

    New Auto-Interp
    Negative Logits
    KER
    -0.73
    monary
    -0.72
    hower
    -0.71
    teenth
    -0.66
    teen
    -0.64
    inary
    -0.64
    Requires
    -0.62
    buck
    -0.62
     painfully
    -0.62
    urrent
    -0.62
    POSITIVE LOGITS
     confines
    0.90
     zone
    0.87
     toward
    0.83
     stance
    0.80
     reception
    0.79
     towards
    0.79
    glers
    0.77
     environment
    0.77
     demeanor
    0.74
     matchups
    0.71
    Act Density 0.078%

    No Known Activations