INDEX
Explanations
instances of authorship or submission attribution in the text
New Auto-Interp
Negative Logits
ÄĽn
-0.16
abeth
-0.15
Habit
-0.15
ilder
-0.14
abis
-0.14
stick
-0.14
incer
-0.14
çĻ
-0.14
Families
-0.13
inee
-0.13
POSITIVE LOGITS
Emm
0.17
avl
0.14
IRTH
0.14
ÏĦοÏħÏĤ
0.14
onto
0.14
Ludwig
0.14
slice
0.14
egg
0.13
neck
0.13
문íĻĶ
0.13
Activations Density 0.011%