INDEX
    Explanations

    negations or expressions of doubt and disbelief

    New Auto-Interp
    Negative Logits
     tilgjenge
    -0.63
     Jefus
    -0.62
     becauſe
    -0.59
     eſt
    -0.57
     againſt
    -0.55
     Available
    -0.54
     circonst
    -0.53
     interessanti
    -0.52
     acestea
    -0.52
     perfons
    -0.52
    POSITIVE LOGITS
     want
    1.05
     knew
    1.02
     wanted
    1.00
    didn
    0.98
    want
    0.98
     thought
    0.96
     liked
    0.95
     think
    0.94
     hate
    0.94
     know
    0.94
    Act Density 0.164%

    No Known Activations