INDEX
Explanations
phrases that express comparative relationships or intensifiers
New Auto-Interp
Negative Logits
Published
-0.70
igger
-0.62
ULAR
-0.59
dun
-0.57
Explore
-0.56
eruption
-0.56
bro
-0.54
concise
-0.54
Begin
-0.53
Sit
-0.53
POSITIVE LOGITS
lihood
0.76
erto
0.68
tainment
0.66
anamo
0.65
pection
0.62
intendent
0.61
ours
0.61
ernel
0.61
pecting
0.61
ohl
0.60
Activations Density 0.035%