INDEX
Explanations
phrases that indicate possession or belonging
New Auto-Interp
Negative Logits
ugin
-0.15
isan
-0.15
yz
-0.15
лиÑħ
-0.15
xf
-0.14
fel
-0.14
auf
-0.14
اع
-0.14
aml
-0.14
810
-0.14
POSITIVE LOGITS
UDA
0.16
há»ĵi
0.15
¹Ħ
0.15
樣
0.14
terdam
0.14
imation
0.14
ÑĢÑĥн
0.14
indsight
0.14
ãģĸ
0.14
isko
0.14
Activations Density 0.089%