INDEX
Explanations
instances of words that are followed by punctuation marks
occurrences of punctuation and words indicating relationships or dependencies
New Auto-Interp
Negative Logits
frog
-0.69
HEAD
-0.69
Sham
-0.67
JV
-0.67
CRE
-0.66
Responsibility
-0.60
riors
-0.59
SAY
-0.58
front
-0.58
Lean
-0.58
POSITIVE LOGITS
uated
1.83
uating
1.54
uates
1.43
uation
1.35
uate
1.30
uations
1.22
uality
1.12
iously
1.10
aneous
1.10
uitous
1.02
Activations Density 0.046%