INDEX
    Explanations

    abstract concepts and scenarios

    New Auto-Interp
    Negative Logits
     таких
    1.63
    这一
    1.59
     такими
    1.54
     이러한
    1.51
    这种
    1.50
     této
    1.47
     цьому
    1.45
     tejto
    1.44
    這種
    1.44
     이런
    1.43
    POSITIVE LOGITS
     will
    1.33
     has
    1.30
     VERY
    1.26
     very
    1.23
     terribly
    1.22
     hebben
    1.20
     have
    1.18
     belongs
    1.16
     heeft
    1.16
    めっちゃ
    1.15
    Act Density 0.799%

    No Known Activations