INDEX
Explanations
phrases related to historical figures and their monuments
New Auto-Interp
Negative Logits
ãĥ§
-0.15
ssp
-0.15
го
-0.14
erce
-0.14
Ĺi
-0.14
ercul
-0.13
waste
-0.13
æ¼ı
-0.13
swim
-0.13
室
-0.12
POSITIVE LOGITS
ows
0.16
_fig
0.15
Specifier
0.14
blk
0.14
athe
0.14
idf
0.14
еж
0.14
rieving
0.14
metals
0.13
MetroFramework
0.13
Activations Density 0.080%