INDEX
Explanations
references to specific locations or places
New Auto-Interp
Negative Logits
encies
-0.78
conflic
-0.71
Seym
-0.71
nces
-0.70
cffff
-0.69
milo
-0.67
alloy
-0.67
illian
-0.67
circumstance
-0.67
anyahu
-0.66
POSITIVE LOGITS
through
1.21
about
1.13
ways
1.04
away
1.02
abouts
0.98
er
0.94
ers
0.94
bow
0.91
way
0.89
own
0.88
Activations Density 0.021%