INDEX
Explanations
references to public statements made in formal settings such as interviews and speeches
references to interviews and speeches in the context of political or public statements
New Auto-Interp
Negative Logits
''.
-0.56
wd
-0.54
angs
-0.54
}}}
-0.53
pes
-0.53
default
-0.52
.''.
-0.51
edges
-0.51
=#
-0.50
udic
-0.50
POSITIVE LOGITS
that
1.08
that
1.07
:"
0.78
"â̦
0.77
"[
0.76
how
0.74
whether
0.71
why
0.71
è¦ļéĨĴ
0.70
"'
0.70
Activations Density 0.206%