INDEX
Explanations
sentences related to laws, policies, and legal actions
New Auto-Interp
Negative Logits
sleeper
-0.75
toy
-0.68
wardrobe
-0.67
closet
-0.66
optional
-0.66
ditch
-0.65
dummy
-0.65
phantom
-0.64
silhouette
-0.64
juice
-0.64
POSITIVE LOGITS
Their
0.99
Though
0.97
Previously
0.96
Among
0.95
Speaking
0.94
Essentially
0.94
Having
0.93
His
0.93
Similarly
0.93
Whereas
0.91
Activations Density 0.264%