INDEX
Explanations
references to personal struggles and mental health issues
New Auto-Interp
Negative Logits
CHIP
-0.17
ifestyles
-0.15
_nat
-0.14
ulum
-0.14
大人
-0.13
üle
-0.13
arov
-0.13
video
-0.13
esub
-0.13
-0.13
POSITIVE LOGITS
letters
0.17
publishers
0.17
writing
0.17
correspond
0.16
corres
0.15
letter
0.15
curacy
0.15
Publishers
0.15
má»±c
0.15
publishing
0.14
Activations Density 0.056%