INDEX
Explanations
phrases indicating a comparison or evaluation of situations or entities
phrases emphasizing multiple perspectives or aspects of a subject
New Auto-Interp
Negative Logits
ĸļ
-0.71
Saud
-0.70
aband
-0.66
etts
-0.65
itton
-0.65
ansas
-0.63
arez
-0.62
agra
-0.62
iere
-0.60
rade
-0.59
POSITIVE LOGITS
resembles
0.95
resemble
0.81
places
0.76
resembled
0.75
hops
0.75
terness
0.71
horr
0.70
resemb
0.68
surpr
0.67
somew
0.66
Activations Density 0.053%