INDEX
Explanations
instances of the word "them" in phrases
references to groups of people or entities described collectively
New Auto-Interp
Negative Logits
RTX
-0.75
Rush
-0.70
Press
-0.67
Charg
-0.66
âĢ¢âĢ¢
-0.66
politics
-0.63
Barn
-0.63
Sadd
-0.62
Fine
-0.62
Deal
-0.62
POSITIVE LOGITS
atic
1.04
atically
0.97
selves
0.91
perished
0.79
alian
0.74
selves
0.74
clustered
0.73
succeeded
0.73
sprinkled
0.73
atics
0.72
Activations Density 0.037%