INDEX
Explanations
phrases related to discussions or arguments, especially in the context of sports or conflicts, where strong opinions are expressed
New Auto-Interp
Negative Logits
preval
-0.73
transact
-0.71
unsus
-0.71
concess
-0.70
unlucky
-0.69
abundantly
-0.69
intentional
-0.67
clerks
-0.67
silly
-0.67
naughty
-0.67
POSITIVE LOGITS
"â̦
1.13
"...
1.12
"(
1.07
Asked
1.04
"[
1.03
"'
1.01
Adds
0.98
However
0.93
<|endoftext|>
0.91
↵
0.90
Activations Density 0.186%