INDEX
Explanations
phrases indicating some level of certainty or comparison, often involving the phrase "at least."
phrases indicating a minimum or a baseline condition
New Auto-Interp
Negative Logits
rence
-0.74
FANT
-0.70
bath
-0.69
rend
-0.67
icides
-0.65
iard
-0.64
ãĥ¼ãĤ¯
-0.63
bern
-0.63
ses
-0.61
shr
-0.60
POSITIVE LOGITS
partly
0.74
uner
0.71
partially
0.69
fair
0.69
judging
0.69
ety
0.67
toler
0.65
theoretically
0.65
temporarily
0.63
een
0.62
Activations Density 0.023%