INDEX
Explanations
interactions involving hugging and physical affection
New Auto-Interp
Negative Logits
lon
-0.17
orny
-0.17
Consumption
-0.15
upo
-0.15
acz
-0.14
autiful
-0.14
kah
-0.14
ddit
-0.14
eyed
-0.14
horn
-0.14
POSITIVE LOGITS
atlas
0.14
liÄŁine
0.14
Gio
0.14
æĤ
0.14
.fig
0.14
Äįka
0.14
pressing
0.13
Seb
0.13
atable
0.13
Statics
0.13
Activations Density 0.108%