INDEX
Explanations
references to the age of individuals, particularly minors in the context of negative actions or situations
references to ages, particularly those associated with children
New Auto-Interp
Negative Logits
wikipedia
-0.61
hops
-0.59
lobb
-0.59
torches
-0.59
hesda
-0.58
Apps
-0.57
osponsors
-0.57
zsche
-0.56
VIDEOS
-0.55
ipedia
-0.55
POSITIVE LOGITS
olds
1.24
old
1.14
old
1.13
olds
0.94
veteran
0.90
OLD
0.90
Old
0.86
-
0.81
Old
0.80
ago
0.75
Activations Density 0.027%