INDEX
Explanations
assertions or claims of correctness or accuracy
New Auto-Interp
Negative Logits
Flavoring
-0.76
Pastebin
-0.75
scrim
-0.74
gins
-0.62
hens
-0.61
convol
-0.61
heed
-0.61
Gong
-0.60
Cth
-0.59
Bund
-0.59
POSITIVE LOGITS
headed
0.82
utherford
0.77
footed
0.76
terday
0.69
eyed
0.69
Bir
0.67
aez
0.67
insofar
0.67
Osw
0.66
about
0.64
Activations Density 0.049%