INDEX
Explanations
long sequences of capital letters
repeated mentions of the word "Long."
New Auto-Interp
Negative Logits
babys
-0.68
seating
-0.67
acting
-0.67
cabinet
-0.66
applicable
-0.65
trust
-0.64
recept
-0.63
availability
-0.62
attending
-0.62
Arch
-0.62
POSITIVE LOGITS
Long
3.59
long
2.07
Short
1.96
LONG
1.69
Long
1.65
Little
1.41
short
1.33
Large
1.30
Old
1.29
Length
1.28
Activations Density 0.016%