INDEX
Explanations
words related to having two of something or involving a pair
references to new entities or categories
New Auto-Interp
Negative Logits
=-=-=-=-=-=-=-=-
-0.72
antry
-0.70
Petra
-0.70
someone
-0.68
FANTASY
-0.68
Abdel
-0.68
uddin
-0.68
rology
-0.67
Emerson
-0.66
Beir
-0.65
POSITIVE LOGITS
thirds
0.88
apiece
0.86
halves
0.85
streams
0.75
brothers
0.75
iliated
0.74
handled
0.74
sisters
0.74
TDs
0.72
totaling
0.71
Activations Density 0.227%