INDEX
Explanations
mentions of user-related identifiers, specifically the term "username"
New Auto-Interp
Negative Logits
ValueStyle
-1.03
Seeder
-0.93
دانشنامهٔ
-0.93
awaiter
-0.83
Moq
-0.82
setof
-0.81
apollo
-0.80
Lyra
-0.79
*}$
-0.79
Tikang
-0.79
POSITIVE LOGITS
username
0.77
whoſe
0.73
Wur
0.73
username
0.71
mistic
0.71
Theſe
0.71
Username
0.68
Wür
0.67
setUsername
0.64
Username
0.64
Activations Density 0.117%