INDEX
Explanations
positive evaluations of experiences or outcomes
Follows "Q:" or "Section" or contains non-English words
New Auto-Interp
Negative Logits
dafx
-0.60
Externé
-0.54
thousands
-0.51
unless
-0.51
hundreds
-0.48
hundreds
-0.48
thousands
-0.47
ugly
-0.46
Izvori
-0.46
Blame
-0.46
POSITIVE LOGITS
estimés
0.70
benefitted
0.67
benefited
0.64
aprend
0.62
mentors
0.62
Aprend
0.62
enriquec
0.61
enri
0.60
0.60
enriching
0.60
Activations Density 0.170%