INDEX
Explanations
prefixes forming specific names
New Auto-Interp
Negative Logits
:/
0.47
Basically
0.45
<:
0.44
explicitly
0.43
Specifically
0.43
CQG
0.43
<>
0.43
inapplicable
0.43
concat
0.43
Presumably
0.43
POSITIVE LOGITS
ida
0.63
adors
0.62
adores
0.59
ari
0.57
al
0.56
adora
0.56
ila
0.55
ag
0.55
isha
0.54
oka
0.54
Activations Density 0.856%