INDEX
Explanations
phrases that introduce examples or explanations
New Auto-Interp
Negative Logits
OGND
-1.05
AndEndTag
-0.90
Diwedd
-0.83
cdti
-0.79
Majefty
-0.73
Houſe
-0.72
itſelf
-0.71
InjectAttribute
-0.70
ⓧ
-0.70
่านั้น
-0.70
POSITIVE LOGITS
:
0.91
consider
0.67
imagine
0.65
when
0.63
The
0.58
When
0.56
if
0.55
suppose
0.55
Suppose
0.55
the
0.53
Activations Density 0.173%