INDEX
    Explanations

    code comments and definitions

    New Auto-Interp
    Negative Logits
     self
    1.10
    self
    1.03
     Self
    0.82
     само
    0.82
    0.77
    Self
    0.76
     zelf
    0.73
     själv
    0.70
     עצ
    0.68
     SELF
    0.68
    POSITIVE LOGITS
    /**
    0.70
     /**
    0.60
    /**/*
    0.52
     !***
    0.52
    $$\
    0.46
     """
    0.45
     ث
    0.43
    """.
    0.43
    τρα
    0.42
    """
    0.41
    Act Density 0.008%

    No Known Activations