INDEX
Explanations
references to mental health issues and their implications
New Auto-Interp
Negative Logits
کرÛĮ
-0.14
zy
-0.14
jn
-0.14
eworthy
-0.14
ochen
-0.13
ýn
-0.13
аÑĢи
-0.13
ÄįÃŃ
-0.13
egrity
-0.13
XR
-0.13
POSITIVE LOGITS
reason
0.85
reasons
0.73
reason
0.65
åİŁåĽł
0.61
Reason
0.59
Reasons
0.59
Reason
0.58
why
0.58
Why
0.57
ìĿ´ìľł
0.54
Activations Density 0.272%