INDEX
Explanations
phrases indicating challenges and opportunities related to support or assistance for various groups or individuals
New Auto-Interp
Head Attr Weights
0:0.01
1:0.14
2:0.13
3:0.11
4:0.01
5:0.02
6:0.09
7:0.10
8:0.07
9:0.07
10:0.09
11:0.10
Negative Logits
ACY
-1.13
ilion
-1.04
oria
-1.04
atum
-1.03
acy
-1.01
atem
-1.00
Mill
-1.00
ravity
-0.99
riot
-0.99
asse
-0.99
POSITIVE LOGITS
racuse
1.13
Reviewer
1.12
Django
1.05
SOS
1.04
frogs
1.04
unprotected
1.03
democracies
1.02
pick
0.99
wart
0.98
commissions
0.98
Activations Density 0.022%