INDEX
Explanations
references to the ending of a time period or completion of an activity
New Auto-Interp
Negative Logits
Ranked
-0.76
weeney
-0.69
umn
-0.65
angles
-0.64
ean
-0.64
disadvantaged
-0.62
eq
-0.62
oS
-0.62
abuses
-0.57
İĭ
-0.57
POSITIVE LOGITS
reckoning
0.80
!
0.77
goodbye
0.77
!'
0.76
joy
0.72
!:
0.71
!"
0.69
Congratulations
0.68
:(
0.68
!,
0.68
Activations Density 0.331%