INDEX
Explanations
the presence of the string "ada" in various forms
New Auto-Interp
Negative Logits
names
-0.73
icles
-0.71
giving
-0.69
sheet
-0.68
rophic
-0.67
ãĤĮ
-0.66
mother
-0.65
ician
-0.65
taking
-0.65
URES
-0.65
POSITIVE LOGITS
uthor
0.99
qua
0.97
$$
0.92
ÄŁ
0.90
illac
0.88
elta
0.81
qa
0.81
BIP
0.80
q
0.80
ibur
0.78
Activations Density 0.009%