INDEX
Explanations
locations or place names
proper nouns, specifically names or titles of people and places
New Auto-Interp
Negative Logits
exceptions
-0.74
circuits
-0.65
poles
-0.61
nons
-0.61
mainline
-0.60
tracts
-0.60
loopholes
-0.60
certainty
-0.58
necessities
-0.58
criminals
-0.58
POSITIVE LOGITS
i
1.93
a
1.87
aan
1.59
e
1.56
eh
1.52
icz
1.50
iya
1.48
aq
1.47
oi
1.46
o
1.46
Activations Density 0.307%