INDEX
Explanations
sentences with the word "are" followed by adjectives or adverbs
phrases indicating the existence or presence of entities and groups
New Auto-Interp
Negative Logits
irmation
-0.68
Supports
-0.65
unfocusedRange
-0.63
orthern
-0.62
ulence
-0.62
itiveness
-0.61
Horizon
-0.61
abil
-0.60
iton
-0.60
iture
-0.60
POSITIVE LOGITS
increasingly
1.08
scrambling
1.06
accustomed
1.06
understandably
1.02
clam
1.02
encouraged
1.02
often
0.98
outraged
0.98
fascinated
0.97
dying
0.95
Activations Density 0.205%