INDEX
Explanations
prepositions indicating direction
conjunctions and phrases indicating relationships
New Auto-Interp
Negative Logits
Panda
-0.78
Democr
-0.76
Pony
-0.75
Clever
-0.72
Welch
-0.69
Freak
-0.69
Fine
-0.68
Yao
-0.67
Examiner
-0.65
FAT
-0.65
POSITIVE LOGITS
rogens
1.04
rogen
1.04
20439
0.89
Against
0.81
Against
0.75
against
0.73
through
0.73
\<
0.71
excluding
0.71
near
0.71
Activations Density 0.073%