INDEX
Explanations
phrases referring to certain characteristics or types mentioned in a comparison
references to types or categories of things
New Auto-Interp
Negative Logits
ansas
-0.71
LESS
-0.67
Shut
-0.63
MAP
-0.63
CD
-0.61
hole
-0.60
Downloadha
-0.59
idav
-0.58
Goodbye
-0.56
adan
-0.56
POSITIVE LOGITS
magnitude
1.48
caliber
1.43
calib
1.36
nature
1.32
stature
1.30
importance
1.24
proportions
1.23
il
1.23
size
1.20
sorts
1.17
Activations Density 0.179%