INDEX
Explanations
references to various social or cultural phenomena and their implications
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.13
3:0.06
4:0.09
5:0.08
6:0.02
7:0.03
8:0.19
9:0.06
10:0.07
11:0.13
Negative Logits
tein
-1.66
heet
-1.55
onom
-1.47
apter
-1.47
ler
-1.45
heed
-1.40
emort
-1.37
tera
-1.37
athy
-1.36
farious
-1.35
POSITIVE LOGITS
soDeliveryDate
1.78
depending
1.66
runners
1.52
enthusi
1.42
inline
1.40
Interstitial
1.39
ONES
1.34
depending
1.33
Schwarzenegger
1.32
respectively
1.32
Activations Density 0.018%