INDEX
Explanations
official statements, memos, and emails
New Auto-Interp
Negative Logits
encounters
-0.60
races
-0.59
breeds
-0.59
extremes
-0.58
edges
-0.56
Flavoring
-0.56
journeys
-0.55
outcomes
-0.55
spills
-0.55
Yards
-0.54
POSITIVE LOGITS
titled
1.05
thanking
0.94
entitled
0.93
stating
0.92
outlining
0.90
denouncing
0.88
urging
0.88
apologizing
0.86
saying
0.84
accusing
0.84
Activations Density 0.188%