INDEX
Explanations
words related to spatial description in structures or features, potentially related to architecture or machinery
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.93
ģĸ
-0.71
pires
-0.67
roo
-0.67
ourke
-0.65
atever
-0.64
STD
-0.63
olean
-0.62
Thou
-0.62
eur
-0.62
POSITIVE LOGITS
themselves
1.33
individually
0.96
collectively
0.95
selves
0.89
uniformly
0.89
numbered
0.88
evenly
0.83
scattered
0.78
dispersed
0.77
rotated
0.74
Activations Density 0.538%