INDEX
Explanations
conditional phrases indicating hypothetical situations
New Auto-Interp
Negative Logits
é¡ĺãģĦ
-0.15
amaz
-0.14
igel
-0.14
ìłĢ
-0.14
Dy
-0.13
ãĥĭãĥĥãĤ¯
-0.13
ãģĹãģĭ
-0.13
olo
-0.13
ane
-0.13
iegel
-0.13
POSITIVE LOGITS
interested
0.25
interested
0.22
anyone
0.21
Anyone
0.21
Interested
0.20
Interested
0.19
interes
0.19
Anyone
0.19
anybody
0.19
interess
0.18
Activations Density 0.043%