INDEX
Explanations
phrases expressing similarity or continuity
statements asserting the validity or truth of a proposition across various contexts
New Auto-Interp
Negative Logits
helicop
-0.80
deserts
-0.70
casters
-0.67
calcul
-0.67
escal
-0.61
flares
-0.61
inker
-0.60
ramps
-0.60
TAG
-0.59
LIM
-0.59
POSITIVE LOGITS
kefeller
0.79
VB
0.79
same
0.77
ItemThumbnailImage
0.75
Tradable
0.74
oother
0.74
":"/
0.74
Same
0.72
^^^^
0.72
Berm
0.71
Activations Density 0.211%