INDEX
Explanations
references to specific individuals in context
New Auto-Interp
Negative Logits
anmar
-0.80
ifted
-0.76
arcity
-0.76
glim
-0.76
ifies
-0.75
uitous
-0.74
ENCY
-0.74
ifying
-0.73
ific
-0.73
committee
-0.73
POSITIVE LOGITS
lla
0.93
ette
0.89
que
0.85
ttes
0.83
lli
0.80
vre
0.80
llo
0.79
brate
0.77
Hebdo
0.76
brates
0.76
Activations Density 0.008%