INDEX
    Explanations

    pronouns/quantifiers

    New Auto-Interp
    Negative Logits
     are
    -0.66
     ModelExpression
    -0.66
     للمعارف
    -0.64
     AssemblyCulture
    -0.60
     were
    -0.59
     Suivez
    -0.58
    aarrggbb
    -0.55
     juſ
    -0.54
     Cliquez
    -0.54
     seem
    -0.53
    POSITIVE LOGITS
     Of
    0.85
    Of
    0.79
     Ways
    0.68
     épaules
    0.68
     Than
    0.66
    OF
    0.65
     vœux
    0.62
     Lives
    0.59
     Sides
    0.59
     Real
    0.59
    Act Density 0.674%

    No Known Activations