INDEX
    Explanations

    completely followed by descriptor

    New Auto-Interp
    Negative Logits
    ن
    0.68
    માં
    0.64
    0.62
     prominently
    0.61
     but
    0.59
    ",
    0.59
     često
    0.59
     crimen
    0.58
     wellknown
    0.58
    '];
    0.57
    POSITIVE LOGITS
    完全
    0.92
     полностью
    0.80
     completely
    0.79
    完全に
    0.79
     Completely
    0.79
    completely
    0.76
     完全
    0.75
     volledig
    0.75
    彻底
    0.72
     완전히
    0.68
    Act Density 0.057%

    No Known Activations