INDEX
Explanations
phrases indicating examples or instances
New Auto-Interp
Negative Logits
ibold
-0.17
ë§Ŀ
-0.15
IP
-0.15
ohon
-0.15
aina
-0.15
ohana
-0.15
ohan
-0.14
Äį
-0.13
STALL
-0.13
itness
-0.13
POSITIVE LOGITS
those
0.69
those
0.63
Those
0.59
Those
0.57
éĤ£äºĽ
0.46
éĤ£ç§į
0.45
ceux
0.34
éĤ£ä¸ª
0.34
تÙĦÙĥ
0.32
celui
0.30
Activations Density 0.167%