INDEX
    Explanations

    the presence of the word "Written" at the beginning of texts or articles

    New Auto-Interp
    Negative Logits
    Ĭ±
    -0.85
    nel
    -0.80
     Shinra
    -0.78
    agara
    -0.71
    nels
    -0.70
    ĪĴ
    -0.70
    Sensor
    -0.69
    allows
    -0.69
    illon
    -0.67
    alon
    -0.67
    POSITIVE LOGITS
    escription
    0.83
     written
    0.76
     aloud
    0.76
    itatively
    0.72
    acters
    0.72
     eloqu
    0.70
     instrument
    0.69
    written
    0.69
     intention
    0.69
     tongue
    0.69
    Act Density 0.027%

    No Known Activations