INDEX
Explanations
locations or places near borders
the word "the" or phrases that include it frequently, indicating a focus on definite articles in descriptive contexts
New Auto-Interp
Negative Logits
anew
-0.74
rix
-0.70
frey
-0.69
fully
-0.68
ardi
-0.66
thereafter
-0.65
ophobia
-0.64
eson
-0.64
indeed
-0.64
distinguishes
-0.63
POSITIVE LOGITS
entrance
1.09
horizon
0.99
periphery
0.98
outskirts
0.97
nearest
0.95
main
0.94
walls
0.92
town
0.90
intersection
0.89
perimeter
0.89
Activations Density 0.217%