INDEX
Explanations
phrases indicating worthiness or deservingness
expressions of worthiness or merit
New Auto-Interp
Negative Logits
ullivan
-0.81
glers
-0.75
nels
-0.69
interstate
-0.67
lems
-0.67
ories
-0.66
synchronized
-0.64
processes
-0.62
surfing
-0.61
recruited
-0.61
POSITIVE LOGITS
praise
0.88
cellence
0.78
scorn
0.77
criticism
0.75
EDIT
0.75
cipled
0.74
ridicule
0.70
ãģĭ
0.70
accol
0.70
inguished
0.70
Activations Density 0.102%