INDEX
Explanations
words and phrases associated with actions or movements
actions followed by direction
New Auto-Interp
Negative Logits
Masyarakat
-0.39
among
-0.33
représent
-0.33
ÁND
-0.33
śnia
-0.33
ACIONAL
-0.32
capan
-0.31
Budaya
-0.30
Misalnya
-0.30
popular
-0.29
POSITIVE LOGITS
ſelben
0.86
<unused41>
0.86
<unused14>
0.86
[@BOS@]
0.86
<unused8>
0.86
<unused42>
0.85
<unused43>
0.85
<unused28>
0.85
<unused17>
0.85
<unused21>
0.85
Activations Density 0.056%