INDEX
Explanations
age descriptions in terms of years, which are often related to allegations or incidents
references to the ages of individuals involved in serious incidents, particularly minors
New Auto-Interp
Negative Logits
lobb
-0.62
akin
-0.60
Dialogue
-0.59
Parables
-0.56
hops
-0.56
morrow
-0.56
omsky
-0.55
balcon
-0.55
essler
-0.55
peak
-0.55
POSITIVE LOGITS
olds
1.03
old
0.82
OLD
0.80
veteran
0.78
old
0.76
olds
0.73
ago
0.71
pregnant
0.66
iversary
0.66
ieving
0.65
Activations Density 0.060%