INDEX
Explanations
ages or numerical information related to age
mentions of ages or age-related descriptors
New Auto-Interp
Negative Logits
hops
-0.73
seek
-0.60
Apps
-0.60
anus
-0.59
afety
-0.59
tags
-0.58
chat
-0.56
cart
-0.55
trigger
-0.55
balcon
-0.55
POSITIVE LOGITS
old
1.28
olds
1.25
-
1.12
old
1.11
veteran
1.01
OLD
0.95
olds
0.93
âĢij
0.81
oldest
0.80
-'
0.80
Activations Density 0.048%