INDEX
Explanations
prepositions
occurrences of the word "of."
New Auto-Interp
Negative Logits
partName
-0.70
appra
-0.67
FW
-0.65
passers
-0.64
MacArthur
-0.62
ridic
-0.59
Prairie
-0.58
multiplication
-0.57
Extras
-0.57
CBI
-0.56
POSITIVE LOGITS
sky
1.28
rontal
1.17
ield
1.16
lav
1.14
milo
1.03
ortunately
1.02
rame
1.01
ski
0.98
rio
0.98
icial
0.98
Activations Density 0.026%