INDEX
Explanations
negative contractions and possessive forms
Character following an apostrophe or number
numbers and letters
New Auto-Interp
Negative Logits
–,
-0.79
─
-0.76
[{
-0.69
▼
-0.68
'\\;'
-0.67
’”
-0.65
、“
-0.63
]),
-0.63
.–
-0.63
――
-0.63
POSITIVE LOGITS
I
0.87
ppl
0.67
:)
0.65
it
0.64
nice
0.64
دیگه
0.63
guys
0.63
şey
0.63
my
0.62
stupid
0.62
Activations Density 0.179%