INDEX
Explanations
phrases related to the discussion or quoting of various perspectives or statements by different individuals
New Auto-Interp
Negative Logits
ãĥİ
-0.72
atible
-0.64
otin
-0.63
paralle
-0.62
Fit
-0.61
estine
-0.60
lor
-0.60
ILCS
-0.59
Appearances
-0.59
imilar
-0.58
POSITIVE LOGITS
sarcast
1.06
bluntly
0.98
rhet
0.94
emphatically
0.81
diplom
0.81
aloud
0.76
.
0.75
confidently
0.73
bitterly
0.71
omin
0.71
Activations Density 0.458%