INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     alleg
    -0.09
     gambling
    -0.09
     Gambling
    -0.09
     assimil
    -0.09
     eyewitness
    -0.08
    小說
    -0.08
     encycl
    -0.08
     slaughter
    -0.08
     encyclopedia
    -0.08
     বন্দ
    -0.08
    POSITIVE LOGITS
     CSS
    0.16
    CSS
    0.15
     typography
    0.14
    .css
    0.14
     stylesheet
    0.14
     css
    0.14
    css
    0.14
    <style
    0.13
    Styles
    0.13
     Styling
    0.13
    Act Density 0.022%

    No Known Activations