INDEX
Explanations
phrases related to features, characteristics, or aspects of something
affirmative statements about the existence or presence of certain features or items
New Auto-Interp
Negative Logits
erva
-0.75
erity
-0.70
Hast
-0.68
culosis
-0.68
heid
-0.67
cence
-0.65
ision
-0.64
Ł
-0.63
iliation
-0.62
formation
-0.62
POSITIVE LOGITS
pertinent
1.24
relevant
1.22
important
1.16
noteworthy
1.14
interesting
1.11
overlooked
1.08
salient
1.07
helpful
1.04
staples
1.03
worthwhile
1.03
Activations Density 0.214%