INDEX
Explanations
phrases related to political figures or actions
nouns related to roles, positions, and professions
New Auto-Interp
Negative Logits
Doors
-0.71
earable
-0.68
[|
-0.67
doors
-0.65
angan
-0.64
everlasting
-0.62
cour
-0.62
fml
-0.62
Content
-0.60
Amendments
-0.60
POSITIVE LOGITS
glers
0.80
rers
0.77
lier
0.71
arers
0.71
ifier
0.70
ifiers
0.69
kit
0.66
rator
0.66
advocate
0.66
cares
0.64
Activations Density 0.647%