INDEX
    Explanations

    categories and specific words

    New Auto-Interp
    Negative Logits
     sensors
    0.64
     equine
    0.57
     by
    0.55
     commanders
    0.54
     inverter
    0.53
     creatinine
    0.53
     atta
    0.52
     films
    0.52
     stockp
    0.52
     industrialists
    0.52
    POSITIVE LOGITS
     ngữ
    0.44
    '};
    0.44
    0.43
    Pressed
    0.42
     كه
    0.42
     Bedingungen
    0.42
     hại
    0.41
    "};
    0.41
     บ้าง
    0.40
    ประเภท
    0.40
    Act Density 0.001%

    No Known Activations