INDEX
Explanations
words related to scientific experiments or processes
New Auto-Interp
Negative Logits
alone
-0.67
sis
-0.67
Cola
-0.63
border
-0.61
cour
-0.61
oland
-0.60
roller
-0.59
arat
-0.59
align
-0.59
shield
-0.58
POSITIVE LOGITS
glances
0.87
valuable
0.82
funds
0.81
scraps
0.78
copies
0.76
priceless
0.75
stolen
0.73
snippets
0.73
assets
0.73
information
0.72
Activations Density 0.186%