INDEX

Explanations

references to the influence or effect of various factors

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 impact

-0.77

 impacto

-0.68

impact

-0.66

 Impact

-0.58

Impact

-0.58

sorted

-0.55

 nở

-0.51

ogaster

-0.49

AnchorStyles

-0.48

 Dumas

-0.47

POSITIVE LOGITS

 influenced

1.73

influenced

1.34

 Influ

0.99

 influenci

0.97

 swayed

0.87

Influ

0.86

engaruhi

0.85

 beeinf

0.81

 RouterModule

0.76

 Influences

0.76

Activations Density 0.003%