INDEX
Explanations
names with a specific pattern
names of individuals, particularly those involved in legal or political contexts
New Auto-Interp
Negative Logits
sylv
-0.62
ascript
-0.58
fict
-0.56
natureconservancy
-0.56
ACTIONS
-0.55
scorp
-0.54
sugg
-0.54
acebook
-0.53
cryst
-0.53
CTRL
-0.53
POSITIVE LOGITS
ban
0.80
ovsky
0.77
allah
0.77
akis
0.74
aya
0.72
án
0.72
ich
0.71
nik
0.71
aja
0.70
je
0.69
Activations Density 0.478%