INDEX
Explanations
instances of negative feedback and mental health references
New Auto-Interp
Negative Logits
mga
-0.16
UNG
-0.15
人们
-0.15
IPs
-0.14
ungle
-0.14
types
-0.14
sects
-0.14
workforce
-0.14
entries
-0.14
various
-0.14
POSITIVE LOGITS
item
0.28
piece
0.26
member
0.23
ä¹ĭä¸Ģ
0.22
person
0.21
instance
0.20
member
0.20
molecule
0.20
Piece
0.19
element
0.18
Activations Density 1.195%