INDEX
    Explanations

    phrases that indicate explanations and justifications for phenomena

    New Auto-Interp
    Negative Logits
    expandindo
    -0.68
    OGND
    -0.62
    Hauptartikel
    -0.57
     Biôgrafia
    -0.55
    (!__
    -0.54
     kasarigan
    -0.53
    ########.
    -0.51
     springfox
    -0.50
     Photocase
    -0.50
    unknownFields
    -0.50
    POSITIVE LOGITS
     why
    0.56
    why
    0.47
     mysterious
    0.45
     suspiciously
    0.45
     varför
    0.44
     WHY
    0.42
     interpreting
    0.41
     purposes
    0.41
     unexplained
    0.40
     observed
    0.40
    Act Density 1.612%

    No Known Activations