INDEX
Explanations
mentions of mental health-related terms or concepts
references to mental health topics
New Auto-Interp
Negative Logits
advertisement
-0.80
ICA
-0.78
IRD
-0.71
oulos
-0.71
ded
-0.69
aday
-0.69
gger
-0.68
ARDS
-0.68
rb
-0.67
ELS
-0.66
POSITIVE LOGITS
faculties
0.98
illness
0.93
disorders
0.89
defic
0.87
disorder
0.84
itary
0.82
wellbeing
0.81
izing
0.80
ising
0.80
retard
0.79
Activations Density 0.012%