INDEX
Explanations
statements about analogies and comparisons between different concepts
New Auto-Interp
Negative Logits
Universities
-0.74
Legions
-0.73
directions
-0.68
newsletters
-0.66
Surve
-0.65
Pers
-0.65
luster
-0.64
Dems
-0.62
Kurds
-0.61
alities
-0.61
POSITIVE LOGITS
ALWAYS
0.87
agraph
0.81
capable
0.78
typically
0.78
indistinguishable
0.78
usually
0.77
worth
0.77
agine
0.76
inherently
0.75
preferable
0.74
Activations Density 0.223%