INDEX
Explanations
mentions of specific universities, academic positions, and organizations
sentences that represent various academic affiliations or professional roles
New Auto-Interp
Negative Logits
sleeper
-0.74
toy
-0.68
silhouette
-0.67
closet
-0.67
mummy
-0.66
grunt
-0.65
wardrobe
-0.65
optional
-0.64
phantom
-0.64
slightest
-0.64
POSITIVE LOGITS
Their
0.95
Previously
0.95
Among
0.91
Essentially
0.91
Though
0.90
Specifically
0.90
Increasing
0.89
Those
0.89
Conversely
0.89
Others
0.89
Activations Density 0.206%