INDEX
    Explanations

    references to specific metrics or points

    New Auto-Interp
    Negative Logits
     Princip
    -0.22
     principals
    -0.16
    kit
    -0.16
    AEA
    -0.16
    lore
    -0.15
     зал
    -0.15
    rats
    -0.15
    beit
    -0.15
    òng
    -0.14
    ικ
    -0.14
    POSITIVE LOGITS
    blank
    0.29
    Blank
    0.28
     Blank
    0.28
     blank
    0.26
    -of
    0.24
    lessly
    0.24
    sett
    0.22
    ill
    0.21
    y
    0.20
     guard
    0.20
    Act Density 0.020%

    No Known Activations