INDEX
Explanations
phrases related to potential outcomes or consequences
phrases that suggest potential consequences or possibilities
New Auto-Interp
Negative Logits
teen
-0.72
cloth
-0.67
washer
-0.67
raining
-0.66
raint
-0.62
cies
-0.62
core
-0.60
got
-0.59
honors
-0.59
Maker
-0.59
POSITIVE LOGITS
feas
1.28
conce
1.25
be
1.04
potentially
1.02
possibly
0.98
tremend
0.97
theoretically
0.96
hypot
0.96
someday
0.94
berra
0.94
Activations Density 0.090%