INDEX
Explanations
phrases that introduce a statement or topic
phrases that introduce or reference topics of discussion
New Auto-Interp
Negative Logits
~~~~~~~~
-0.70
eries
-0.69
PIN
-0.69
î
-0.68
knit
-0.67
CTV
-0.67
rils
-0.67
��
-0.66
\'
-0.65
frac
-0.65
POSITIVE LOGITS
specifics
0.79
concluding
0.78
naming
0.76
praising
0.74
mentioning
0.71
dismissing
0.70
parting
0.69
correcting
0.69
criticism
0.68
ebus
0.67
Activations Density 0.149%