INDEX
    Explanations

    proper nouns and names of places or entities

    New Auto-Interp
    Negative Logits
     "
    -0.46
      
    -0.44
     all
    -0.43
     “
    -0.42
    -0.40
     and
    -0.40
     or
    -0.40
     (
    -0.39
     not
    -0.39
     in
    -0.38
    POSITIVE LOGITS
     المعيارى
    1.02
    ロウィン
    0.91
     betweenstory
    0.90
     שוליים
    0.84
    WriteTagHelper
    0.84
    ſelben
    0.83
    ſammen
    0.82
    [@BOS@]
    0.79
    <unused14>
    0.79
    <unused28>
    0.79
    Act Density 1.047%

    No Known Activations