INDEX
Explanations
mentions of actions involving the nose
occurrences of the term "sn" likely referring to derogatory or negative slang terms
New Auto-Interp
Negative Logits
heid
-0.84
xual
-0.72
EMENT
-0.72
WAYS
-0.65
shire
-0.64
PowerPoint
-0.61
minus
-0.61
lack
-0.61
mine
-0.61
Templar
-0.58
POSITIVE LOGITS
obb
1.17
appers
1.15
agging
1.15
atching
1.15
agged
1.14
apper
1.13
atches
1.12
appy
1.10
oot
1.09
ips
1.09
Activations Density 0.012%