INDEX
Explanations
phrases or words related to comparisons
instances of the word "against"
New Auto-Interp
Negative Logits
hops
-0.91
overed
-0.81
liner
-0.77
aird
-0.76
operated
-0.76
DragonMagazine
-0.75
jri
-0.74
lus
-0.72
details
-0.71
assy
-0.71
POSITIVE LOGITS
whom
0.83
anybody
0.80
against
0.77
them
0.69
somebody
0.67
behalf
0.67
anyone
0.66
Humanity
0.65
adversity
0.65
Against
0.65
Activations Density 0.046%