INDEX
Explanations
elements of structured or academic writing
New Auto-Interp
Negative Logits
émon
-0.16
adors
-0.16
ander
-0.15
ambah
-0.15
riott
-0.14
μον
-0.14
esson
-0.14
asje
-0.14
dek
-0.14
edith
-0.14
POSITIVE LOGITS
pur
0.23
fel
0.21
pur
0.21
tort
0.20
magna
0.18
Pur
0.18
met
0.18
tell
0.18
eros
0.18
fel
0.17
Activations Density 0.003%