INDEX
    Explanations

    words related to support or help

    features related to characteristics of individuals or identity

    New Auto-Interp
    Negative Logits
    .</
    -0.82
    -0.77
    .?
    -0.76
    .''.
    -0.76
    .—
    -0.73
    .[
    -0.71
     âĢķ
    -0.71
    .
    -0.70
    âĢł
    -0.70
    *.
    -0.69
    POSITIVE LOGITS
     however
    0.94
     tho
    0.78
     meanwhile
    0.77
     alot
    0.72
     organise
    0.70
     though
    0.69
     realise
    0.64
     organising
    0.63
    anwhile
    0.63
     learnt
    0.60
    Act Density 1.589%

    No Known Activations