INDEX
Explanations
phrases that emphasize ownership, significance, or the effect of actions
New Auto-Interp
Negative Logits
plx
-0.17
ayout
-0.16
athy
-0.15
inne
-0.15
VRT
-0.14
oda
-0.14
inati
-0.14
serie
-0.14
ãĥªãĤ«
-0.13
âĦĸâĦĸ
-0.13
POSITIVE LOGITS
lot
0.59
lot
0.47
Lot
0.45
ton
0.45
LOT
0.44
_lot
0.44
Lot
0.43
.lot
0.40
Ton
0.38
LOT
0.37
Activations Density 0.098%