INDEX
Explanations
positive adjectives that convey improvement or enhancement
New Auto-Interp
Negative Logits
similarity
-0.62
SpaceEngineers
-0.62
atum
-0.60
similarities
-0.58
stanbul
-0.56
alan
-0.56
rat
-0.55
athi
-0.55
utical
-0.54
authorization
-0.53
POSITIVE LOGITS
terday
0.75
ible
0.72
anew
0.70
Enlarge
0.66
again
0.65
ISH
0.65
nell
0.65
enged
0.64
TER
0.63
IRED
0.62
Activations Density 0.078%