INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    aby
    -0.18
    ostat
    -0.18
    stadt
    -0.17
     Platt
    -0.16
    terra
    -0.16
    åŀ
    -0.15
    iker
    -0.15
    upp
    -0.15
    ôle
    -0.15
    alla
    -0.14
    POSITIVE LOGITS
    bomb
    0.22
     Warner
    0.22
     honored
    0.22
     capsule
    0.21
     honoured
    0.21
     dilation
    0.20
    hon
    0.20
     warp
    0.20
    Machine
    0.20
    _machine
    0.19
    Act Density 0.045%

    No Known Activations