INDEX
Explanations
phrases related to describing relationships, especially in terms of comparison and contrast
New Auto-Interp
Negative Logits
OGR
-0.87
unc
-0.83
gow
-0.81
vous
-0.73
nell
-0.72
TPPStreamerBot
-0.71
nit
-0.71
bye
-0.71
quished
-0.67
kamp
-0.67
POSITIVE LOGITS
sexes
1.21
genders
1.07
halves
0.97
extremes
0.86
sides
0.84
factions
0.81
combatants
0.80
eras
0.76
two
0.75
conflicting
0.71
Activations Density 0.857%