INDEX
Explanations
specific formatting patterns, such as unusual characters or sequences
punctuation and formatting elements in text
New Auto-Interp
Negative Logits
equival
-0.86
reflex
-0.85
instinct
-0.84
shorthand
-0.84
agon
-0.80
persuasion
-0.79
sucker
-0.78
pse
-0.78
grip
-0.76
revol
-0.75
POSITIVE LOGITS
Meanwhile
1.52
RELATED
1.49
Tickets
1.46
Also
1.46
Advertisements
1.45
Additionally
1.44
According
1.44
Among
1.43
About
1.42
Other
1.40
Activations Density 0.501%