INDEX
Explanations
phrases expressing reactions or affirmations
New Auto-Interp
Negative Logits
853
-0.17
ugi
-0.15
oder
-0.15
ogs
-0.14
tisk
-0.14
867
-0.14
quent
-0.13
Conway
-0.13
ashtra
-0.13
pact
-0.13
POSITIVE LOGITS
etten
0.17
enberg
0.17
.mapbox
0.16
ollah
0.15
emin
0.15
ickle
0.15
oho
0.15
InBackground
0.14
lub
0.14
_VERBOSE
0.14
Activations Density 0.017%