INDEX
Explanations
phrases related to arguments and disagreements
phrases related to disagreements or debates
New Auto-Interp
Negative Logits
ionage
-0.83
ngth
-0.82
iaries
-0.81
vik
-0.72
OGR
-0.72
ortun
-0.69
ROR
-0.68
IJ
-0.68
--+
-0.67
IENCE
-0.66
POSITIVE LOGITS
whether
1.08
legality
0.83
semantics
0.80
how
0.79
whether
0.78
merits
0.78
topics
0.71
definitions
0.71
footing
0.71
nuances
0.70
Activations Density 0.185%