INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     CreateTagHelper
    -0.98
     للمعارف
    -0.80
     مرئيه
    -0.78
    脚注の使い方
    -0.75
     ModelExpression
    -0.74
     Drink
    -0.74
     &___
    -0.71
     utafitiHapana
    -0.71
     HttpNotFound
    -0.71
    +#+#
    -0.69
    POSITIVE LOGITS
     clean
    0.54
     plain
    0.51
     clear
    0.48
     non
    0.46
     free
    0.45
     чи
    0.44
     un
    0.42
    Facades
    0.41
     pure
    0.41
     cool
    0.40
    Act Density 0.001%

    No Known Activations