INDEX
Explanations
social interactions and relationships
New Auto-Interp
Negative Logits
±
-0.17
ÌĨ
-0.16
ENSE
-0.15
.gt
-0.14
ngth
-0.14
okrat
-0.14
CodeAt
-0.13
uiltin
-0.13
جة
-0.13
uth
-0.13
POSITIVE LOGITS
yard
0.17
whom
0.15
_DAT
0.15
shame
0.15
Shame
0.14
essler
0.14
INVAL
0.14
Yard
0.14
lack
0.13
ávÄĽ
0.13
Activations Density 0.125%