INDEX
Explanations
instances of parentheses and their associated content
New Auto-Interp
Negative Logits
æıı
-0.15
Boyle
-0.14
éĨ
-0.14
uelle
-0.14
ANTA
-0.14
umas
-0.14
æĪ
-0.14
Butter
-0.13
README
-0.13
Spy
-0.13
POSITIVE LOGITS
literal
0.66
literally
0.63
Liter
0.60
liter
0.57
pun
0.56
pun
0.53
literal
0.50
figur
0.50
Literal
0.50
Liter
0.48
Activations Density 0.108%