INDEX
Explanations
phrases indicating a comparison or relationship between different entities or concepts
the word "the" and its variations in different contexts
New Auto-Interp
Negative Logits
Canaver
-0.78
ourney
-0.75
%%%%
-0.74
anew
-0.72
azel
-0.71
NB
-0.68
entimes
-0.67
elman
-0.67
perm
-0.66
dar
-0.66
POSITIVE LOGITS
sexes
1.47
ages
0.92
extremes
0.91
genders
0.89
aforementioned
0.87
parties
0.84
respective
0.83
latter
0.82
factions
0.81
two
0.77
Activations Density 0.072%