INDEX
Explanations
occurrences of the word "from."
New Auto-Interp
Negative Logits
rospy
-0.56
они
-0.55
깐
-0.51
bezeichneter
-0.47
вони
-0.47
вона
-0.45
он
-0.44
він
-0.44
она
-0.44
keep
-0.43
POSITIVE LOGITS
afar
1.38
whence
1.17
within
1.16
scratch
1.14
across
1.10
within
1.10
abroad
1.06
elsewhere
1.02
the
1.01
segi
0.98
Activations Density 0.271%