INDEX
Explanations
specific vocabulary related to evaluations or descriptions of actions and conditions
New Auto-Interp
Negative Logits
ano
-0.15
iets
-0.15
arged
-0.14
utes
-0.14
oplevel
-0.14
@a
-0.13
thôi
-0.13
oho
-0.13
asper
-0.12
ãĥ³ãĥĩãĤ£
-0.12
POSITIVE LOGITS
fact
0.24
uario
0.17
fact
0.16
manner
0.16
inya
0.15
vise
0.15
γοÏħ
0.15
idon
0.15
اÛĮÙĨÚ©Ùĩ
0.14
ĵ¨
0.14
Activations Density 0.242%