INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     muualla
    -0.47
     ratio
    -0.45
    uolo
    -0.44
     Means
    -0.42
     means
    -0.42
    steuer
    -0.41
     enlargement
    -0.40
     Ratios
    -0.40
    DISABLE
    -0.40
    はじめに
    -0.40
    POSITIVE LOGITS
     literally
    1.33
    literally
    1.26
     Literally
    1.24
    Literally
    1.19
     literal
    1.10
     literalmente
    1.05
     буквально
    0.87
    literal
    0.87
     Literal
    0.83
     litté
    0.81
    Act Density 0.091%

    No Known Activations