INDEX
Explanations
phrases emphasizing significant quantities or strengths
New Auto-Interp
Negative Logits
ysc
-0.74
uay
-0.72
rick
-0.70
Origins
-0.70
flies
-0.70
Annotations
-0.68
runs
-0.65
\/\/
-0.64
np
-0.64
\<
-0.64
POSITIVE LOGITS
thing
1.23
scenario
1.04
situation
0.95
feat
0.90
delicate
0.89
drastic
0.87
huge
0.86
sensitive
0.86
hypothetical
0.85
possibility
0.82
Activations Density 0.026%