INDEX
Explanations
instances of the word "run," especially in contexts related to fleeing or escaping
New Auto-Interp
Negative Logits
alam
-0.70
ayers
-0.62
Birth
-0.61
repre
-0.61
ortium
-0.61
Emer
-0.59
grave
-0.59
olia
-0.59
Lauder
-0.59
plur
-0.58
POSITIVE LOGITS
aways
1.03
swick
1.00
escape
0.98
gs
0.97
ways
0.91
ners
0.89
dy
0.88
af
0.86
nin
0.85
running
0.83
Activations Density 0.622%