INDEX
Explanations
phrases starting with "in" followed by a number
prepositional phrases indicating compliance or alignment with guidelines or concepts
New Auto-Interp
Negative Logits
SHIP
-0.84
cia
-0.72
greets
-0.70
DragonMagazine
-0.69
interacts
-0.69
induces
-0.64
haircut
-0.63
introduces
-0.63
reacts
-0.63
laughs
-0.62
POSITIVE LOGITS
escap
1.03
versely
0.94
essence
0.91
avering
0.82
fact
0.81
cluded
0.80
herent
0.76
clusive
0.75
arg
0.75
alle
0.74
Activations Density 0.237%