INDEX
Explanations
prepositions and conjunctions that indicate relationships between concepts
New Auto-Interp
Negative Logits
834
-0.17
enou
-0.16
schemas
-0.16
489
-0.15
iflower
-0.15
fidf
-0.15
mmas
-0.14
ebo
-0.14
isÃŃ
-0.14
546
-0.14
POSITIVE LOGITS
sta
0.32
st
0.28
sv
0.25
utom
0.24
ment
0.24
bind
0.23
uts
0.23
vä
0.22
ber
0.22
vir
0.22
Activations Density 0.011%