INDEX
Explanations
discussions surrounding social issues and personal experiences
New Auto-Interp
Negative Logits
:-)
-0.17
sez
-0.16
OK
-0.16
amping
-0.16
Anyway
-0.15
culo
-0.14
dit
-0.14
надо
-0.13
ego
-0.13
uggle
-0.13
POSITIVE LOGITS
amongst
0.23
surrounding
0.22
whilst
0.21
due
0.21
ãĥ¼
0.21
seperate
0.20
-esque
0.19
strictly
0.19
towards
0.19
within
0.19
Activations Density 1.332%