INDEX
Explanations
names or mentions of a specific individual, potentially a public figure
occurrences of names or titles followed by variations of 'Pr' indicating a connection to individuals, particularly in a formal or official context
New Auto-Interp
Negative Logits
BuyableInstoreAndOnline
-0.72
seiz
-0.67
aston
-0.62
Wonderland
-0.62
romeda
-0.61
DIR
-0.61
¥µ
-0.59
Monroe
-0.58
Arri
-0.57
disadvant
-0.57
POSITIVE LOGITS
ciples
1.10
cipled
1.03
itte
0.86
cess
0.81
xus
0.75
bably
0.71
mend
0.70
aceutical
0.69
fecture
0.69
achine
0.67
Activations Density 0.053%