INDEX
Explanations
phrases that indicate progress or advancement towards a goal
phrases indicating progress or future expectations
New Auto-Interp
Negative Logits
PLIED
-0.63
soType
-0.61
detail
-0.60
amples
-0.59
tein
-0.58
aida
-0.58
Toggle
-0.57
isms
-0.57
ecause
-0.55
mop
-0.55
POSITIVE LOGITS
to
0.99
toward
0.91
towards
0.82
to
0.70
graded
0.70
llor
0.70
To
0.68
for
0.67
PsyNetMessage
0.67
inex
0.66
Activations Density 0.123%