INDEX
Explanations
references to text formatting, especially focusing on the word "format"
mentions of different text formats and formatting instructions
New Auto-Interp
Negative Logits
roma
-0.85
doms
-0.80
arma
-0.75
hiro
-0.72
worth
-0.70
guard
-0.70
nee
-0.68
ĺħ
-0.67
Michele
-0.66
vironment
-0.66
POSITIVE LOGITS
ters
1.04
ting
0.83
atted
0.82
format
0.79
tered
0.77
tering
0.74
formats
0.74
etter
0.72
aldehyde
0.71
furt
0.70
Activations Density 0.037%