INDEX
Explanations
mentions of a specific proper noun starting with "Sw"
references to a specific individual or entity associated with "Sw."
New Auto-Interp
Negative Logits
uate
-0.75
âĸ¬âĸ¬
-0.73
============
-0.73
inates
-0.66
ously
-0.66
Mayhem
-0.63
uated
-0.62
çķ
-0.61
======
-0.59
ional
-0.59
POSITIVE LOGITS
imming
1.32
indle
1.20
itched
1.17
ollen
1.12
itches
1.09
ifty
1.08
allows
1.08
allowed
1.07
inging
1.07
immer
1.05
Activations Density 0.016%