INDEX
Explanations
usernames or names in a specific format
specific identifiers and nouns related to usernames, lists, and entities
New Auto-Interp
Negative Logits
Cub
-0.61
Mug
-0.56
Negro
-0.56
FedEx
-0.53
zac
-0.52
othermal
-0.52
uba
-0.52
ivable
-0.51
alike
-0.50
obo
-0.50
POSITIVE LOGITS
and
0.97
&
0.97
&
0.96
AND
0.94
/
0.94
/
0.86
and
0.81
or
0.80
andowski
0.76
ands
0.75
Activations Density 0.641%