INDEX
Explanations
phrases where information or opinions are being highlighted or emphasized
phrases indicating acknowledgment or observation
New Auto-Interp
Negative Logits
quer
-0.93
ILCS
-0.78
imeters
-0.76
enic
-0.75
ene
-0.72
agnetic
-0.72
ibur
-0.71
ãĥ¼ãĥ«
-0.69
taboola
-0.69
nect
-0.69
POSITIVE LOGITS
how
1.04
similarities
1.04
inconsistencies
1.03
discrepancies
0.99
that
0.97
shortcomings
0.90
differences
0.89
inaccur
0.87
prominently
0.85
approving
0.85
Activations Density 0.047%