INDEX
Explanations
various forms of the word "reason" and related concepts of purpose and justification
New Auto-Interp
Negative Logits
миÑĢ
-0.07
anka
-0.07
riel
-0.06
ESA
-0.06
ç͍çļĦ
-0.06
iller
-0.06
ikan
-0.06
erk
-0.06
vit
-0.06
jet
-0.06
POSITIVE LOGITS
alone
0.09
alone
0.09
ä¹ĭä¸Ģ
0.09
among
0.07
annes
0.07
sake
0.07
ÙĪØºÙĬر
0.07
NAL
0.07
among
0.07
nement
0.07
Activations Density 0.003%