INDEX
Explanations
mentions of a specific number
numerical references or identifiers related to data or measurements
New Auto-Interp
Negative Logits
shape
-0.67
ardless
-0.66
ifully
-0.66
fare
-0.64
liga
-0.64
iful
-0.62
orem
-0.62
velt
-0.61
fall
-0.61
urus
-0.59
POSITIVE LOGITS
00
0.93
Sins
0.75
th
0.71
mm
0.70
rd
0.70
olog
0.69
acht
0.69
oche
0.68
åĤ
0.68
Thieves
0.67
Activations Density 0.033%