INDEX
Explanations
comparisons between different entities, where one entity is rated as higher or worse than the other
comparisons to the word "other" in various contexts
New Auto-Interp
Negative Logits
oke
-0.81
uko
-0.71
ossession
-0.67
OA
-0.66
orney
-0.64
ony
-0.64
finally
-0.64
went
-0.63
enance
-0.62
hyde
-0.62
POSITIVE LOGITS
worldly
1.21
kinds
0.97
types
0.91
iator
0.84
forms
0.83
incarn
0.83
iating
0.83
countries
0.82
facets
0.82
continents
0.81
Activations Density 0.058%