INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     experiment
    -1.39
     Experiment
    -1.38
    experiment
    -1.28
    Experiment
    -1.20
     EXPERIMENT
    -1.12
     experimentation
    -1.09
    EXPERIMENT
    -1.05
     experimental
    -1.03
     Experiments
    -1.02
     experiments
    -1.00
    POSITIVE LOGITS
     morire
    0.49
     credere
    0.43
     poichè
    0.41
     papà
    0.40
    出來的
    0.38
     privadas
    0.38
     żel
    0.38
     sarebbero
    0.37
     raiſ
    0.37
    ība
    0.37
    Act Density 0.005%

    No Known Activations