INDEX
    Explanations

    negations and phrases indicating exclusion or absence

    New Auto-Interp
    Negative Logits
    __':
    
    -0.88
    colgroup
    -0.82
    __":
    -0.80
    __':
    -0.71
    LabelTagHelper
    -0.71
    hals
    -0.71
    iastes
    -0.70
    gründung
    -0.69
    __":
    
    -0.68
    =$?
    -0.68
    POSITIVE LOGITS
    G
    0.58
     Kitch
    0.57
    dro
    0.56
    I
    0.56
    d
    0.55
     desple
    0.54
    y
    0.52
    Pog
    0.51
    p
    0.51
    check
    0.50
    Act Density 0.002%

    No Known Activations