INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lieber
    0.27
     ammonia
    0.26
    道理
    0.26
     જોવા
    0.26
     overarching
    0.25
     outing
    0.25
     habría
    0.24
     quy
    0.24
     betta
    0.24
     powdery
    0.24
    POSITIVE LOGITS
    7
    0.46
    2
    0.43
    6
    0.42
    8
    0.40
    5
    0.38
    9
    0.38
    4
    0.36
    3
    0.33
    :
    0.31
    1
    0.30
    Act Density 0.096%

    No Known Activations