INDEX
Explanations
phrases indicating additional information or emphasis
phrases or concepts indicating caveats or additional notes
New Auto-Interp
Negative Logits
twitch
-0.75
Constructed
-0.74
hard
-0.66
sil
-0.64
arij
-0.63
spot
-0.62
isf
-0.60
shell
-0.59
iminary
-0.59
ophys
-0.59
POSITIVE LOGITS
lihood
0.81
epad
0.68
_>
0.64
mentioning
0.64
indexes
0.63
omsday
0.63
DonaldTrump
0.62
incidentally
0.60
è£ı
0.60
imagine
0.59
Activations Density 0.022%