INDEX
    Explanations

    specific characters or symbols

    New Auto-Interp
    Negative Logits
    ĸļ
    -0.86
    itism
    -0.81
    kers
    -0.79
    ppelin
    -0.79
    omnia
    -0.79
    iday
    -0.75
    esan
    -0.73
    ocene
    -0.72
    eson
    -0.72
    iflower
    -0.71
    POSITIVE LOGITS
    âĶĢâĶĢâĶĢâĶĢ
    1.21
    âĶĢâĶĢ
    1.14
    âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
    1.11
    âķIJâķIJ
    1.02
    âĶĢ
    0.84
    --------
    0.79
    Record
    0.77
    aneous
    0.77
    BALL
    0.75
    Ķ
    0.74
    Act Density 0.010%

    No Known Activations