INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     attaché
    0.43
    反応
    0.42
    福岡
    0.42
     Davao
    0.41
     Filipino
    0.40
     isotonic
    0.39
    タイ
    0.39
     pipa
    0.39
     hukum
    0.38
     hidden
    0.38
    POSITIVE LOGITS
    Transformers
    0.58
     Transformers
    0.55
    🐾
    0.50
     stallions
    0.47
    ltry
    0.46
     Transformer
    0.46
     transformers
    0.45
     어느
    0.45
     Equest
    0.43
    0.42
    Act Density 0.068%

    No Known Activations