INDEX
Explanations
tokens that are encoded in a specific format resembling identifiers
occurrences of the number four
New Auto-Interp
Negative Logits
vier
-0.76
esville
-0.76
venue
-0.72
Whitman
-0.72
Cheong
-0.70
clair
-0.64
becca
-0.62
iday
-0.61
Winchester
-0.61
Vaugh
-0.61
POSITIVE LOGITS
teenth
1.22
teen
1.16
eva
1.00
hyde
0.86
Chan
0.83
some
0.82
66666666
0.79
amaz
0.79
cyl
0.79
th
0.78
Activations Density 0.083%