INDEX
    Explanations

    words related to importance, significance, or consequence.

    New Auto-Interp
    Negative Logits
    <bos>
    -1.69
    ↵↵
    -0.57
     nemlig
    -0.53
     themſelves
    -0.48
     χρήση
    -0.48
     kullanılır
    -0.47
     accanto
    -0.47
     frumos
    -0.46
     kuitenkin
    -0.46
     natale
    -0.45
    POSITIVE LOGITS
    AddTagHelper
    0.94
     مشين
    0.84
    tvguidetime
    0.70
    óc
    0.69
    findpost
    0.69
    AnchorStyles
    0.68
    ).__
    0.66
    __*/
    0.65
    Sharper
    0.63
    0.62
    Act Density 1.041%

    No Known Activations