INDEX
Explanations
proper names, possibly including surnames
references to individuals, particularly journalists or public figures
New Auto-Interp
Negative Logits
WARN
-0.71
llers
-0.67
ously
-0.67
Clarkson
-0.67
cavity
-0.65
lly
-0.64
rency
-0.62
checks
-0.60
balloons
-0.58
des
-0.58
POSITIVE LOGITS
ivas
1.24
anus
0.86
terness
0.86
anian
0.82
ahu
0.81
arov
0.81
ques
0.81
anos
0.80
lov
0.80
anas
0.78
Activations Density 0.015%