INDEX
Explanations
the presence of a particular character or symbol representing a brand
New Auto-Interp
Negative Logits
scattering
-0.81
DPR
-0.73
lodging
-0.64
scatter
-0.63
purse
-0.62
guiActiveUnfocused
-0.61
Ukrain
-0.61
blender
-0.61
muse
-0.61
Counsel
-0.60
POSITIVE LOGITS
º
1.15
¹
1.14
£
1.03
Į
0.99
ı
0.96
į
0.94
Ĵ
0.93
¿
0.92
§
0.90
agree
0.89
Activations Density 0.195%