INDEX
Explanations
specific mentions of the word "format"
mentions of different formats or structures
New Auto-Interp
Negative Logits
roma
-0.84
doms
-0.82
guard
-0.73
arma
-0.73
nee
-0.72
hiro
-0.72
ĺħ
-0.70
ortium
-0.69
minent
-0.69
brow
-0.68
POSITIVE LOGITS
ters
1.09
ting
0.88
atted
0.85
format
0.84
tered
0.80
aldehyde
0.79
Format
0.78
formats
0.77
etter
0.77
tering
0.77
Activations Density 0.027%