INDEX
Explanations
words related to locations or places, often including distances and directions
occurrences of the word 'ind'
New Auto-Interp
Negative Logits
itton
-0.72
xit
-0.68
senal
-0.66
depreciation
-0.63
tremend
-0.62
prus
-0.62
avorite
-0.61
ngth
-0.60
compr
-0.59
iannopoulos
-0.59
POSITIVE LOGITS
erella
1.17
icative
1.16
ebted
1.13
irect
1.11
sight
1.09
icator
1.07
ividually
1.01
icators
0.99
s
0.99
ications
0.98
Activations Density 0.018%