INDEX
Explanations
phrases mentioning specific durations or time periods
phrases that indicate a minimum or threshold requirement
New Auto-Interp
Negative Logits
aceutical
-0.64
FTWARE
-0.61
kindly
-0.61
Carmen
-0.60
Reviewer
-0.56
Giul
-0.55
Marian
-0.54
drums
-0.54
disliked
-0.53
Berry
-0.53
POSITIVE LOGITS
least
1.56
onement
1.31
roph
1.13
yp
0.99
rial
0.97
hens
0.95
olls
0.89
rophic
0.89
abase
0.88
las
0.87
Activations Density 0.094%