INDEX
Explanations
informal expressions and colloquial language
New Auto-Interp
Negative Logits
ImageContext
-0.54
validamos
-0.50
ostavi
-0.50
pleaſure
-0.44
ModelExpression
-0.44
مشين
-0.41
ตร์
-0.41
🟤
-0.41
存于互联网档案馆
-0.40
Архівовано
-0.40
POSITIVE LOGITS
icoli
0.46
phor
0.44
stry
0.43
I
0.43
meta
0.42
fib
0.42
gotta
0.42
fish
0.41
Meta
0.41
hit
0.40
Activations Density 0.010%