INDEX
Explanations
analogies comparing one thing to another using words like "like" or "akin to" or "is like"
New Auto-Interp
Negative Logits
formance
-0.79
amily
-0.78
icipated
-0.75
vo
-0.71
merce
-0.71
furthermore
-0.69
Site
-0.69
legates
-0.68
icip
-0.68
dor
-0.68
POSITIVE LOGITS
aspirin
1.04
Titanic
0.96
747
0.88
cigarettes
0.87
steroids
0.87
chess
0.86
Hitler
0.84
Napoleon
0.83
typew
0.83
candy
0.83
Activations Density 0.736%