INDEX
Explanations
instances of the word "strong" and its variations, indicating a focus on strength or robustness
New Auto-Interp
Negative Logits
upaten
-0.67
atimes
-0.64
kaido
-0.62
Maus
-0.62
jub
-0.61
jima
-0.60
[]:
-0.60
délib
-0.60
oblivion
-0.59
bliss
-0.59
POSITIVE LOGITS
STRONG
1.61
strong
1.60
strength
1.60
Strong
1.57
Strong
1.54
STRONG
1.49
strength
1.49
Strength
1.45
Strength
1.43
strong
1.43
Activations Density 0.097%