INDEX
Explanations
words related to gates and related terms
New Auto-Interp
Negative Logits
ortium
-0.97
Hots
-0.82
issance
-0.82
lihood
-0.77
ensional
-0.75
TING
-0.70
nces
-0.69
encers
-0.66
arus
-0.66
ynasty
-0.64
POSITIVE LOGITS
keepers
1.38
keeper
1.32
ways
1.17
posts
1.07
keeping
1.07
stones
1.00
house
0.93
gates
0.91
fold
0.90
bell
0.88
Activations Density 0.008%