INDEX
Explanations
references to specific individuals or entities associated with political, economic, and entertainment contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.10
3:0.16
4:0.32
5:0.04
6:0.03
7:0.04
8:0.02
9:0.05
10:0.08
11:0.05
Negative Logits
iggurat
-1.92
proof
-1.70
=>
-1.58
simulated
-1.57
uted
-1.56
icester
-1.54
achable
-1.48
mir
-1.48
appropriate
-1.46
cooked
-1.45
POSITIVE LOGITS
sake
2.12
purposes
2.02
coffers
1.97
aspiring
1.87
dearly
1.83
ankind
1.79
fledgling
1.79
newcomers
1.79
wat
1.79
$.
1.77
Activations Density 0.280%