INDEX
Explanations
verbs related to performance or behavior
act like, in, of, Accordingly
New Auto-Interp
Negative Logits
getahuan
-0.39
simple
-0.38
Simple
-0.38
encontra
-0.38
information
-0.38
anteriormente
-0.36
yh
-0.36
justiça
-0.35
Simple
-0.35
threshold
-0.35
POSITIVE LOGITS
Мексичка
0.58
stunts
0.54
utafitiHapana
0.54
0.53
behave
0.53
Act
0.53
Infórmanos
0.53
act
0.52
queſta
0.52
+#+
0.51
Activations Density 0.009%