INDEX
    Explanations

    reducing or breaking down barriers

    New Auto-Interp
    Negative Logits
     weren
    0.34
    je
    0.33
     belum
    0.32
     different
    0.31
     didn
    0.31
     aren
    0.31
    ye
    0.30
    没有什么
    0.30
    mo
    0.30
    ighth
    0.30
    POSITIVE LOGITS
     altogether
    0.57
     Altogether
    0.42
     최대한
    0.38
     eradicate
    0.37
     대신
    0.36
    రిక
    0.36
     hẳn
    0.36
     путем
    0.35
     tamamen
    0.35
     entirely
    0.35
    Act Density 0.363%

    No Known Activations