INDEX
Explanations
occurrences of the word "the"
New Auto-Interp
Head Attr Weights
0:0.27
1:0.02
2:0.01
3:0.06
4:0.03
5:0.12
6:0.05
7:0.01
8:0.28
9:0.06
10:0.01
11:0.01
Negative Logits
spectators
-2.14
Interstitial
-1.94
fixtures
-1.81
candles
-1.79
fans
-1.75
drinkers
-1.69
fixture
-1.66
alike
-1.63
Customers
-1.61
goers
-1.60
POSITIVE LOGITS
Institute
1.84
consultancy
1.81
University
1.76
UCLA
1.75
NYU
1.75
Academic
1.72
Career
1.70
University
1.68
orgetown
1.66
Springer
1.66
Activations Density 0.010%