INDEX
Explanations
terms related to pardons and amnesty
pardon, Amnesty, parm
New Auto-Interp
Negative Logits
U
-0.47
He
-0.46
zzleHttp
-0.44
Ria
-0.43
he
-0.43
Li
-0.42
Things
-0.42
HS
-0.42
MC
-0.41
Louisa
-0.41
POSITIVE LOGITS
pardon
1.98
Pardon
1.85
Pardon
1.66
pardon
1.61
pardoned
1.45
pard
1.42
Pard
1.27
perdón
0.96
pard
0.93
perdon
0.90
Activations Density 0.004%