INDEX
Explanations
comparisons of similarity or equality
comparisons emphasizing equality or similarity
New Auto-Interp
Negative Logits
DIT
-0.74
MAP
-0.72
bryce
-0.72
POST
-0.70
mt
-0.66
Ry
-0.65
UL
-0.64
runs
-0.64
UCT
-0.64
ULE
-0.63
POSITIVE LOGITS
pired
0.80
ptin
0.77
scrut
0.76
advertised
0.75
vain
0.75
iffe
0.73
eloqu
0.70
schild
0.69
itzer
0.67
iable
0.66
Activations Density 0.036%