INDEX
Explanations
phrases indicating future actions
phrases indicating future actions or intentions
New Auto-Interp
Negative Logits
CTV
-0.75
cius
-0.75
mere
-0.74
Reporting
-0.73
SourceFile
-0.67
sett
-0.66
NS
-0.65
sav
-0.64
checking
-0.64
cart
-0.64
POSITIVE LOGITS
be
1.02
explode
0.96
stick
0.93
hell
0.93
lose
0.92
need
0.92
try
0.92
get
0.92
make
0.89
unleash
0.87
Activations Density 0.080%