INDEX
Explanations
words related to legal and political entities
possessive forms and contractions
New Auto-Interp
Negative Logits
ournals
-0.87
agues
-0.70
planet
-0.68
apart
-0.67
imedia
-0.67
ographers
-0.65
cloth
-0.64
女
-0.64
ingred
-0.64
genre
-0.63
POSITIVE LOGITS
refusal
1.38
inability
1.34
efforts
1.31
attempts
1.30
insistence
1.29
decision
1.29
attempt
1.28
unwillingness
1.23
involvement
1.21
reluctance
1.21
Activations Density 0.271%