INDEX
    Explanations

    phrases indicating consistency and alignment with certain standards or expectations

    New Auto-Interp
    Negative Logits
    Katso
    -0.47
     Gefahr
    -0.47
    $_['
    -0.47
    marvin
    -0.46
     LoginPage
    -0.46
    waltung
    -0.45
     îna
    -0.45
    alamus
    -0.44
    WebServlet
    -0.44
    Geographie
    -0.44
    POSITIVE LOGITS
     Consistent
    0.95
     consistent
    0.95
    Consistent
    0.88
    consistent
    0.85
     consistency
    0.81
     Consistency
    0.80
     Consist
    0.79
    Consistency
    0.74
     consistently
    0.73
     inconsistent
    0.72
    Act Density 0.017%

    No Known Activations