INDEX
Explanations
references to engagement and interaction with others
New Auto-Interp
Negative Logits
icie
-0.17
familiar
-0.16
acquaintance
-0.15
Fam
-0.15
çĨŁ
-0.15
amiliar
-0.14
idis
-0.14
become
-0.14
urst
-0.14
NAMESPACE
-0.13
POSITIVE LOGITS
occupied
0.23
entertained
0.22
guessing
0.20
busy
0.20
occup
0.19
Occup
0.18
Guess
0.18
occupied
0.18
Occup
0.17
ocup
0.17
Activations Density 0.019%