INDEX
    Explanations

    indicators of boolean properties or states, particularly those indicating "false."

    New Auto-Interp
    Negative Logits
     Infórmanos
    -0.59
    -0.50
    eramente
    -0.47
    mbangan
    -0.45
    usiai
    -0.45
    ternut
    -0.43
     متعلقه
    -0.43
     anlam
    -0.42
    __":
    
    -0.42
    vedades
    -0.42
    POSITIVE LOGITS
     false
    2.92
    false
    2.50
     False
    2.34
    False
    2.19
     FALSE
    1.91
    FALSE
    1.70
     falsos
    1.66
     falso
    1.66
     falsas
    1.59
     falsa
    1.55
    Act Density 0.003%

    No Known Activations