INDEX
Explanations
words related to intimacy and close relationships
New Auto-Interp
Negative Logits
اÙĨات
-0.15
bach
-0.15
radu
-0.15
iam
-0.14
baum
-0.14
_impl
-0.14
hop
-0.14
bsub
-0.14
anch
-0.14
erald
-0.14
POSITIVE LOGITS
Priv
0.16
dire
0.15
heet
0.15
ease
0.15
ties
0.15
اÙĤØ©
0.14
priv
0.14
ease
0.14
ôi
0.14
©
0.13
Activations Density 0.015%