INDEX
    Explanations

    smaller, simpler, fewer

    New Auto-Interp
    Negative Logits
    serious
    0.41
    ори
    0.41
    本格
    0.38
    öny
    0.38
    servername
    0.38
    難しい
    0.38
     Expensive
    0.37
    Expensive
    0.37
     стратеги
    0.37
    oncé
    0.36
    POSITIVE LOGITS
     smaller
    1.30
     minor
    1.24
     kleinere
    1.20
    Smaller
    1.11
     simple
    1.10
     simpler
    1.10
     Smaller
    1.09
     sederhana
    1.07
     kleiner
    1.05
     간단
    1.05
    Act Density 0.113%

    No Known Activations