INDEX
Explanations
references to personal relationships and social interactions
New Auto-Interp
Negative Logits
houſe
-0.73
клопе
-0.68
seamnă
-0.68
noDo
-0.65
acquisto
-0.65
purpoſe
-0.64
itſelf
-0.63
odotus
-0.63
pleaſure
-0.63
myſelf
-0.63
POSITIVE LOGITS
later
0.57
baada
0.57
Shortly
0.56
Later
0.56
nachdem
0.56
successivamente
0.55
after
0.54
após
0.53
addGap
0.52
Subsequently
0.51
Activations Density 0.703%