INDEX
Explanations
references to high quantities or levels, often in relation to performance metrics or characteristics
New Auto-Interp
Negative Logits
quiv
-0.16
oard
-0.15
monds
-0.14
cular
-0.14
gems
-0.14
ipa
-0.14
ège
-0.14
quet
-0.14
odon
-0.14
_IL
-0.14
POSITIVE LOGITS
(er
0.20
/high
0.17
enough
0.17
indeed
0.17
/fast
0.16
781
0.16
Priest
0.14
aje
0.14
ummer
0.14
ieder
0.14
Activations Density 0.364%