INDEX
    Explanations

    specific characters or symbols, particularly certain Thai or Indic characters, and numeric representations

    New Auto-Interp
    Negative Logits
    इटम
    -0.57
    -0.56
     们
    -0.54
    ValueStyle
    -0.53
    học
    -0.52
     חיצוניים
    -0.51
     背影
    -0.51
    ülle
    -0.49
    ázaro
    -0.48
    श्ले
    -0.48
    POSITIVE LOGITS
     Jefus
    0.61
    ſelf
    0.60
     laiton
    0.55
     Theſe
    0.53
     Majefty
    0.51
     myſelf
    0.51
     pleaſure
    0.49
     juſ
    0.49
     greateſt
    0.49
    ſelves
    0.49
    Act Density 0.005%

    No Known Activations