INDEX
    Explanations

    phrases indicating causation or justification

    New Auto-Interp
    Negative Logits
    ENN
    -0.16
    åĸ
    -0.15
    ennen
    -0.15
    olk
    -0.15
    lements
    -0.14
     siendo
    -0.14
    ffic
    -0.14
    anto
    -0.14
    oker
    -0.14
    ático
    -0.14
    POSITIVE LOGITS
    bane
    0.17
    ÐĴС
    0.15
    ocked
    0.15
     Prince
    0.14
    ADB
    0.14
    ÛĮÙĨÙĩ
    0.14
    dale
    0.14
     Compatible
    0.14
    лÑĥÑĪ
    0.14
    adm
    0.13
    Act Density 0.002%

    No Known Activations