INDEX
Explanations
phrases indicating appearances or assumptions about situations or objects
phrases indicating perceptions or appearances
New Auto-Interp
Negative Logits
ridge
-0.78
ãĢı
-0.76
ierrez
-0.70
culosis
-0.66
ement
-0.64
tex
-0.64
iamond
-0.63
kefeller
-0.63
arette
-0.63
avorite
-0.62
POSITIVE LOGITS
pires
0.79
piring
0.74
ļé
0.71
ãĥĢ
0.67
XM
0.63
inar
0.61
insur
0.61
Ĥª
0.59
ENS
0.59
è£ıç
0.58
Activations Density 0.152%