INDEX
Explanations
phrases that denote an overall assessment or evaluation
New Auto-Interp
Negative Logits
ker
-0.16
pond
-0.16
ãn
-0.16
ess
-0.15
ellery
-0.14
apore
-0.14
ovsky
-0.14
ære
-0.14
isco
-0.14
sport
-0.14
POSITIVE LOGITS
iese
0.16
ingham
0.15
/down
0.15
iz
0.15
enga
0.15
MENT
0.14
most
0.14
asel
0.14
iss
0.14
-purpose
0.13
Activations Density 0.015%