INDEX
    Explanations

    phrases indicating plausibility or potential claims

    plausible claims or predictions

    New Auto-Interp
    Negative Logits
    Tikang
    -0.57
     Comprometido
    -0.53
     незавершена
    -0.51
    ChildScrollView
    -0.51
     CreateTagHelper
    -0.50
    Camila
    -0.50
    efois
    -0.49
    Erfolge
    -0.48
    comings
    -0.48
    źródło
    -0.48
    POSITIVE LOGITS
     א
    1.44
    א
    1.23
     הא
    0.99
    הא
    0.73
     מא
    0.72
    ָא
    0.62
     וא
    0.58
     בא
    0.54
     שא
    0.54
    0.51
    Act Density 0.003%

    No Known Activations