INDEX
Explanations
phrases that indicate ownership or possession
New Auto-Interp
Negative Logits
igue
-0.90
iple
-0.76
ean
-0.74
anu
-0.72
Cosponsors
-0.71
Streamer
-0.71
inet
-0.70
arta
-0.69
icer
-0.68
iste
-0.68
POSITIVE LOGITS
preserving
0.90
protecting
0.83
those
0.81
maintaining
0.78
theirs
0.75
boosting
0.74
whichever
0.74
keeping
0.74
improving
0.71
avoiding
0.70
Activations Density 0.044%