INDEX
Explanations
numerical codes or signatures
occurrences of a specific token
New Auto-Interp
Negative Logits
lihood
-0.69
heid
-0.65
manship
-0.65
Mechdragon
-0.63
folk
-0.63
Polo
-0.62
Loans
-0.62
Brach
-0.61
chel
-0.61
ORGE
-0.60
POSITIVE LOGITS
adle
1.34
acker
1.30
ackers
1.23
acking
1.18
acked
1.12
ushed
1.10
anks
1.07
utch
1.07
umb
1.06
umble
1.05
Activations Density 0.025%