INDEX
Explanations
short phrases that introduce or summarize information
phrases that indicate summaries or overviews
New Auto-Interp
Negative Logits
anism
-0.80
['
-0.75
acid
-0.71
same
-0.71
cair
-0.70
agents
-0.69
eg
-0.68
hes
-0.68
evidence
-0.67
alties
-0.67
POSITIVE LOGITS
couple
1.15
few
1.10
glimpse
1.09
bunch
1.07
lot
1.06
handful
1.02
cknowled
1.02
slew
0.98
plethora
0.95
snippet
0.94
Activations Density 0.368%