INDEX
Explanations
phrases asking or referring to the type or category of something
inquiries about types or categories of things
New Auto-Interp
Negative Logits
chy
-0.74
obyl
-0.73
VEL
-0.71
MY
-0.67
Knight
-0.66
heid
-0.66
ederation
-0.66
Byrne
-0.63
enez
-0.62
anyon
-0.62
POSITIVE LOGITS
etting
0.78
liest
0.72
faces
0.65
isodes
0.64
soever
0.63
rous
0.62
hearted
0.62
?]
0.62
happ
0.62
---------
0.62
Activations Density 0.012%