INDEX
Explanations
phrases expressing strong agreement or endorsement
expressions of certainty or affirmation
New Auto-Interp
Negative Logits
ently
-0.80
sembly
-0.80
roups
-0.78
Mour
-0.77
moreover
-0.69
acity
-0.69
aciously
-0.68
soever
-0.67
entary
-0.67
Simulator
-0.67
POSITIVE LOGITS
gonna
0.75
recommend
0.72
wanna
0.72
gotta
0.69
got
0.69
underrated
0.65
NOT
0.63
uptick
0.63
qualifies
0.62
correlated
0.62
Activations Density 0.046%