INDEX
Explanations
pronouns referring to individuals or groups
subjects performing actions related to statements or claims
New Auto-Interp
Negative Logits
ngth
-0.73
geries
-0.72
opl
-0.70
antam
-0.69
ictional
-0.69
anding
-0.66
raints
-0.66
Americas
-0.65
ifference
-0.65
hang
-0.65
POSITIVE LOGITS
promptly
0.91
vehemently
0.90
deem
0.87
duly
0.84
dubbed
0.83
gladly
0.82
euphem
0.82
dub
0.82
dearly
0.81
ironically
0.81
Activations Density 0.183%