INDEX
Explanations
expressions of opinion or belief
phrases expressing collective opinions or observations
New Auto-Interp
Negative Logits
forms
-0.67
REDACTED
-0.63
aults
-0.60
Glory
-0.58
ulence
-0.56
rocket
-0.56
accompan
-0.54
Romance
-0.53
à
-0.51
Reloaded
-0.51
POSITIVE LOGITS
akening
1.28
're
1.21
've
1.09
need
1.02
akens
1.02
ighed
1.01
shouldn
0.97
believe
0.96
anticipate
0.94
haven
0.91
Activations Density 0.197%