INDEX
Explanations
phrases or sentences related to endorsements, opinions or decisions
combinations of conjunctions or phrases that indicate lists or series
New Auto-Interp
Negative Logits
Actor
-0.62
Starts
-0.61
,
-0.60
uces
-0.60
STON
-0.59
!/
-0.57
Enlarge
-0.57
(>
-0.56
-,
-0.55
worldly
-0.55
POSITIVE LOGITS
are
1.29
have
1.18
were
1.17
deserve
1.16
have
1.10
aren
1.10
seem
1.07
constitute
1.07
appear
1.07
were
1.06
Activations Density 0.169%