INDEX
    Explanations

    phrases related to social interactions and group activities

    New Auto-Interp
    Negative Logits
    â̦↵
    -0.28
    â̦and
    -0.25
    â̦”
    -0.24
    â̦
    -0.23
    â̦.
    -0.22
     â̦↵
    -0.22
    â̦I
    -0.21
    â̦the
    -0.21
    â̦but
    -0.21
    â̦â̦
    -0.21
    POSITIVE LOGITS
    #af
    0.16
    #ab
    0.16
    #ac
    0.16
    #ad
    0.15
     -*-č↵
    0.15
    )frame
    0.14
    )application
    0.14
    #aa
    0.14
    @js
    0.14
    /******/
    0.13
    Act Density 98.153%

    No Known Activations