INDEX
    Explanations

    code and web references

    New Auto-Interp
    Negative Logits
    Including
    0.45
    Nor
    0.43
    Wake
    0.43
    Ger
    0.43
    Virtual
    0.43
    Predicate
    0.42
    Kay
    0.42
    Dest
    0.42
    Dirty
    0.42
    Th
    0.41
    POSITIVE LOGITS
    .
    0.74
     វា
    0.54
     imp
    0.54
     islam
    0.53
     potion
    0.53
     dot
    0.52
     reform
    0.51
     design
    0.50
     lama
    0.50
    -.
    0.49
    Act Density 0.463%

    No Known Activations