INDEX
Explanations
sentences where people agree on a certain topic
phrases indicating consensus or agreement
New Auto-Interp
Negative Logits
nig
-0.89
Enlarge
-0.68
,)
-0.68
ILCS
-0.67
)|
-0.66
/+
-0.66
scratch
-0.65
*)
-0.65
,[
-0.63
\/\/
-0.61
POSITIVE LOGITS
etheless
0.97
ullivan
0.73
onia
0.72
arthed
0.69
cautiously
0.67
ESCO
0.67
risome
0.66
ufact
0.65
eworks
0.64
osuke
0.62
Activations Density 0.395%