INDEX
Explanations
the word "os" and variations of it
New Auto-Interp
Negative Logits
ãĤ¡
-0.71
rence
-0.66
bler
-0.65
taker
-0.64
OWS
-0.63
ASED
-0.63
BRE
-0.62
OUT
-0.61
ufact
-0.60
TextColor
-0.59
POSITIVE LOGITS
hiba
1.46
heet
1.24
ophical
1.19
keleton
1.11
leep
1.07
omething
1.06
aurus
1.06
opher
1.05
ocial
1.04
mith
1.04
Activations Density 0.040%