INDEX
Explanations
mentions of or related to passers-by
references to individuals or groups of people, particularly in contexts related to interaction or observation
New Auto-Interp
Negative Logits
undo
-0.70
ested
-0.68
Forbidden
-0.66
Heads
-0.62
arians
-0.60
Cay
-0.58
Rape
-0.58
ansion
-0.57
Hare
-0.57
Pione
-0.57
POSITIVE LOGITS
ages
1.01
by
0.95
atures
0.94
lihood
0.89
age
0.85
bly
0.76
bys
0.75
passer
0.73
iors
0.71
iper
0.70
Activations Density 0.054%