INDEX
Explanations
mentions of specific names
proper nouns, particularly names and organizations
New Auto-Interp
Negative Logits
©¶æ
-0.92
Centauri
-0.84
semic
-0.70
pores
-0.67
prof
-0.64
dism
-0.62
combustion
-0.62
âĶĢâĶĢ
-0.62
arthed
-0.61
Plants
-0.60
POSITIVE LOGITS
ulum
1.01
ster
0.87
lest
0.83
deck
0.83
ees
0.82
y
0.81
pillar
0.81
caster
0.81
pill
0.80
sters
0.80
Activations Density 0.058%