INDEX
Explanations
praise or criticism for individuals' performances
references to notable individuals and their impact or behavior
New Auto-Interp
Negative Logits
etheless
-0.95
''.
-0.74
":{"-0.73
"))
-0.70
)))
-0.70
"!
-0.68
]).
-0.66
"?
-0.65
attRot
-0.64
)).
-0.63
POSITIVE LOGITS
whereas
0.62
averaging
0.57
replaced
0.55
paired
0.55
overhead
0.53
kios
0.51
upfront
0.51
satur
0.50
reps
0.50
VM
0.49
Activations Density 2.391%