INDEX
Explanations
generic phrases representing importance or priority
phrases emphasizing priority or significance
New Auto-Interp
Negative Logits
anmar
-0.82
rams
-0.81
fell
-0.79
yth
-0.77
ijn
-0.76
uador
-0.76
neys
-0.74
isodes
-0.74
Lyn
-0.74
Cosponsors
-0.73
POSITIVE LOGITS
foremost
0.95
responders
0.75
contributor
0.68
sticking
0.68
importance
0.68
concern
0.67
distinguishes
0.67
cared
0.66
tenance
0.66
addressed
0.65
Activations Density 0.025%