INDEX
Explanations
phrases introducing biographical information
occurrences of the verb "was" indicating past actions or states
New Auto-Interp
Negative Logits
entails
-0.67
holders
-0.65
antioxid
-0.63
ological
-0.63
Which
-0.62
Millions
-0.62
Make
-0.61
IMAGES
-0.60
HAVE
-0.59
sexes
-0.59
POSITIVE LOGITS
hes
1.04
able
1.00
born
0.97
wolves
0.92
wolf
0.92
nt
0.91
briefed
0.90
originally
0.86
sentenced
0.85
awarded
0.85
Activations Density 0.388%