INDEX
Explanations
phrases that introduce explanations or additional information
statements that include the phrase "which is," indicating clarification or elaboration
New Auto-Interp
Negative Logits
actory
-0.82
rongh
-0.74
otte
-0.71
ievers
-0.68
ecake
-0.67
iating
-0.66
icators
-0.65
Numbers
-0.65
igraph
-0.64
ependence
-0.63
POSITIVE LOGITS
why
1.27
admittedly
1.13
understandable
1.04
basically
1.02
presumably
0.98
ironic
0.95
supposed
0.95
essentially
0.90
probably
0.89
obviously
0.89
Activations Density 0.121%