INDEX
Explanations
phrases related to formal legal contexts
references to British culture or entities
New Auto-Interp
Negative Logits
str
-0.78
tx
-0.74
pu
-0.73
Actor
-0.73
yi
-0.69
iri
-0.69
guy
-0.66
Qiao
-0.65
apor
-0.64
pex
-0.64
POSITIVE LOGITS
Brit
1.08
published
0.86
publishes
0.81
published
0.74
ãĤ¼ãĤ¦ãĤ¹
0.70
births
0.69
possessions
0.68
miscar
0.68
RELE
0.68
sheds
0.67
Activations Density 0.001%