INDEX
Explanations
instances of the word "rarely" followed by a non-zero activation value
instances of the word "rarely" and its synonyms, indicating infrequency
New Auto-Interp
Negative Logits
Destruction
-0.72
Submission
-0.71
oÄŁ
-0.70
arta
-0.69
utenberg
-0.68
uers
-0.67
jri
-0.65
uid
-0.65
andi
-0.65
Emirates
-0.64
POSITIVE LOGITS
theless
1.24
entimes
1.06
icably
0.95
epad
0.87
dime
0.80
etheless
0.79
bothered
0.77
Asked
0.77
hesitate
0.77
pmwiki
0.77
Activations Density 0.009%