INDEX
Explanations
proper nouns, specifically names like "Joss" and "Graham"
the word "oss" in various contexts
New Auto-Interp
Negative Logits
ãĥ£
-0.79
ufact
-0.71
è¦ļéĨĴ
-0.69
ropolitan
-0.67
mate
-0.65
ric
-0.64
Beast
-0.63
assies
-0.61
RO
-0.61
azon
-0.60
POSITIVE LOGITS
Whedon
1.23
essed
1.06
essing
1.03
es
0.96
edIn
0.94
enger
0.90
essor
0.90
essors
0.87
ips
0.85
idents
0.83
Activations Density 0.019%