INDEX
    Explanations

    mentions of policy-related terms and discussions

    New Auto-Interp
    Negative Logits
    __':
    
    -0.65
    aneous
    -0.60
     تضيفلها
    -0.60
     cherchés
    -0.58
    gister
    -0.56
    تقاوى
    -0.55
    🏻
    -0.54
    TintMode
    -0.54
    żd
    -0.54
    BarStyle
    -0.53
    POSITIVE LOGITS
     policies
    0.90
    maker
    0.89
    Policies
    0.85
    making
    0.80
     makers
    0.79
     Policies
    0.78
    makers
    0.71
    policies
    0.69
    holder
    0.68
    holders
    0.65
    Act Density 0.052%

    No Known Activations