INDEX
Explanations
describing attributes or entities
New Auto-Interp
Negative Logits
5
0.61
conducta
0.60
8
0.58
análise
0.57
7
0.57
ност
0.54
ون
0.52
viser
0.52
வ்வேறு
0.52
و
0.52
POSITIVE LOGITS
on
0.64
to
0.61
at
0.55
around
0.54
from
0.53
fiery
0.52
favored
0.52
rockets
0.52
McDonalds
0.52
flamboyant
0.51
Activations Density 0.593%