INDEX
Explanations
proper nouns
mentions of specific entities or abbreviations, particularly related to institutions or organizations
New Auto-Interp
Negative Logits
IZE
-0.70
perate
-0.67
igans
-0.66
Construct
-0.63
IAN
-0.62
ãĥī
-0.61
Monaco
-0.61
Nadu
-0.59
utenant
-0.58
ORIG
-0.58
POSITIVE LOGITS
ruary
1.09
acteria
1.05
ilib
1.05
razil
1.05
iotics
1.01
axter
1.00
odies
0.97
yssey
0.96
ishop
0.96
amboo
0.95
Activations Density 0.034%