INDEX
Explanations
references to safe and supportive spaces or environments for various communities
New Auto-Interp
Negative Logits
ÙħÙĤاÙħ
-0.15
illa
-0.14
icot
-0.14
ACING
-0.14
ahoo
-0.14
оÑİ
-0.13
ieber
-0.13
keyed
-0.13
åµ
-0.13
orb
-0.13
POSITIVE LOGITS
atmosphere
0.22
environment
0.20
ortam
0.16
аÑĤмоÑģ
0.16
stigma
0.15
Atmos
0.15
สำหร
0.15
safe
0.14
space
0.14
Indented
0.14
Activations Density 0.084%