INDEX
Explanations
references to tables and structured information
New Auto-Interp
Negative Logits
aren
-0.18
morph
-0.18
morph
-0.18
ayah
-0.17
gaard
-0.16
æĸ
-0.15
Aren
-0.15
weren
-0.14
hadn
-0.14
Morph
-0.14
POSITIVE LOGITS
bids
0.22
sav
0.22
condu
0.18
bad
0.17
milit
0.17
bid
0.17
teen
0.17
afford
0.17
answers
0.17
furn
0.17
Activations Density 0.222%