INDEX
Explanations
references to physical locations or destinations
occurrences of the word "stop" in various contexts
New Auto-Interp
Negative Logits
ighth
-0.80
yss
-0.77
uth
-0.73
ipedia
-0.70
rity
-0.70
aths
-0.69
umbered
-0.68
suscept
-0.68
abund
-0.67
issance
-0.65
POSITIVE LOGITS
gap
1.02
watch
0.97
oppers
0.81
opping
0.74
shop
0.72
reon
0.72
watching
0.72
overs
0.70
bothering
0.69
ãĥ³
0.69
Activations Density 0.018%