INDEX
Explanations
data-driven statements or research findings
phrases that indicate research findings or evidence
New Auto-Interp
Negative Logits
etheless
-0.85
iche
-0.83
ascus
-0.75
icultural
-0.72
plete
-0.70
ilit
-0.70
cult
-0.67
theless
-0.67
pton
-0.67
rete
-0.65
POSITIVE LOGITS
shows
0.99
manship
0.91
biz
0.88
ered
0.86
runners
0.86
heet
0.83
show
0.81
Shows
0.80
runner
0.78
shows
0.77
Activations Density 0.050%