INDEX
Explanations
phrases indicating worthiness or deservingness
New Auto-Interp
Negative Logits
ullivan
-0.57
glers
-0.53
synchronized
-0.51
nels
-0.50
interstate
-0.48
portals
-0.47
lems
-0.47
processes
-0.46
uthor
-0.46
surfing
-0.46
POSITIVE LOGITS
praise
0.65
scorn
0.61
ridicule
0.58
$$$$
0.55
COMPLE
0.55
criticism
0.54
cipled
0.54
cellence
0.54
inement
0.53
ortality
0.53
Activations Density 11.477%