INDEX
Explanations
multiple occurrences of the same word
phrase patterns emphasizing quantities or groups
New Auto-Interp
Negative Logits
Rhodes
-0.70
toast
-0.60
Newark
-0.60
300
-0.59
Hercules
-0.57
Brighton
-0.56
darts
-0.54
brill
-0.53
2100
-0.53
Jericho
-0.53
POSITIVE LOGITS
âĢ
1.91
âĢ
1.54
ãĢ
1.38
âĶ
1.26
âĿ
1.22
Æ
1.18
âĢł
1.17
âĸ
1.15
¨
1.14
âĶĤ
1.14
Activations Density 0.471%