INDEX
Explanations
adjectives describing a specific type or quality
phrases that describe types or categories of things
New Auto-Interp
Negative Logits
å§«
-0.89
INA
-0.76
æ©
-0.75
heid
-0.70
orsi
-0.70
omer
-0.69
æķ
-0.67
æĸ
-0.67
milo
-0.63
åħī
-0.63
POSITIVE LOGITS
liest
0.86
etting
0.74
etter
0.73
thing
0.67
linger
0.65
rouse
0.64
appro
0.64
natureconservancy
0.63
antidote
0.62
lihood
0.58
Activations Density 0.036%