INDEX
Explanations
phrases related to political events or policies
New Auto-Interp
Negative Logits
tyr
-0.69
arsen
-0.64
avorite
-0.60
antine
-0.60
Export
-0.60
antry
-0.60
Minotaur
-0.60
Inqu
-0.59
oreal
-0.57
Metallic
-0.57
POSITIVE LOGITS
stretched
1.36
fitted
1.34
smart
1.12
wards
1.10
casts
1.07
doors
1.07
fitting
1.05
bur
0.99
skirts
0.99
posts
0.99
Activations Density 0.981%