INDEX
Explanations
expressions related to causation or influence
New Auto-Interp
Negative Logits
raiſ
-0.68
itſelf
-0.66
Zd
-0.64
IsContent
-0.62
Wraith
-0.61
Scin
-0.60
onekana
-0.60
Quint
-0.60
autorytatywna
-0.60
drives
-0.58
POSITIVE LOGITS
scolaires
0.67
SequentialGroup
0.65
оригіналу
0.62
γγε
0.60
ebvre
0.58
hohem
0.57
PreferredItem
0.56
CopyWith
0.55
lehetős
0.54
religieuses
0.54
Activations Density 0.079%