INDEX
Explanations
phrases indicating certainty or prediction
phrases indicating future events or possibilities
New Auto-Interp
Negative Logits
buster
-0.67
endeavor
-0.62
plug
-0.58
era
-0.58
oriented
-0.56
achi
-0.55
irs
-0.54
zer
-0.53
didnt
-0.52
endeavour
-0.52
POSITIVE LOGITS
plenty
1.07
ample
0.86
lots
0.83
no
0.80
some
0.80
exceptions
0.79
tons
0.78
ample
0.77
alot
0.75
fewer
0.74
Activations Density 0.038%