INDEX
Explanations
mentions of locations or settings towards the sea or mountains
instances of the suffix "ide" in words
New Auto-Interp
Negative Logits
ente
-0.77
bart
-0.65
alt
-0.65
Eug
-0.65
assian
-0.64
Gay
-0.64
Cald
-0.61
Col
-0.61
Raw
-0.61
fred
-0.60
POSITIVE LOGITS
cery
0.72
activ
0.72
boarding
0.71
lihood
0.70
behaviours
0.70
edIn
0.69
behaviour
0.66
cloth
0.65
ificant
0.64
rily
0.64
Activations Density 0.109%