INDEX
    Explanations

    thoughts, feelings, fantasies, impulses

    New Auto-Interp
    Negative Logits
     sometimes
    0.66
     spesso
    0.63
     גם
    0.62
     parfois
    0.62
    *,
    0.61
     then
    0.61
     סט
    0.60
    sometimes
    0.60
    גם
    0.59
     ε
    0.59
    POSITIVE LOGITS
     would
    0.98
     wouldn
    0.94
    receiving
    0.90
    would
    0.89
     receiving
    0.88
     couldn
    0.83
     find
    0.82
     have
    0.81
     Would
    0.80
    finden
    0.79
    Act Density 0.113%

    No Known Activations