INDEX
Explanations
abstract concepts followed by 'of'
New Auto-Interp
Negative Logits
埥
0.62
。\
0.56
тощо
0.55
玤
0.54
$=\
0.52
qualiter
0.51
ይች
0.50
ంతో
0.50
😘
0.50
alebo
0.50
POSITIVE LOGITS
0.45
‘
0.44
pandemics
0.43
shipbuilding
0.41
introspection
0.41
–
0.40
pervasive
0.40
bureaucratic
0.40
profound
0.40
“
0.39
Activations Density 0.883%