INDEX
Explanations
references to words and language
"word" or similar variations
word followed by other words
New Auto-Interp
Negative Logits
♂️
-0.61
betrek
-0.55
]-->
-0.54
يتيمه
-0.53
Shreve
-0.53
πως
-0.52
FileChooser
-0.50
Shun
-0.50
UnitTesting
-0.50
Normdatei
-0.49
POSITIVE LOGITS
Word
0.85
WORD
0.83
Words
0.82
words
0.79
WORDS
0.78
Word
0.72
word
0.70
Words
0.67
sumpay
0.67
հղումներ
0.64
Activations Density 0.218%