INDEX
Explanations
components of proper noun phrases, particularly related to locations and entities
New Auto-Interp
Negative Logits
539
-0.16
Tib
-0.16
eryl
-0.15
lify
-0.15
arna
-0.14
unning
-0.14
æĭĶ
-0.14
cast
-0.14
utches
-0.14
uffled
-0.13
POSITIVE LOGITS
bject
0.17
ODB
0.17
wart
0.15
elop
0.14
ignon
0.14
_TRAN
0.14
аж
0.14
ãĥ©ãĤ¤ãĥ³
0.14
zag
0.14
adden
0.13
Activations Density 0.015%