INDEX
Explanations
structure and organization in text, particularly related to lists and parameters
before personal pronouns
New Auto-Interp
Negative Logits
こいつ
-0.52
ček
-0.52
précis
-0.52
rire
-0.50
üttel
-0.49
vagas
-0.48
contextLoads
-0.48
Whipple
-0.48
videre
-0.47
вня
-0.47
POSITIVE LOGITS
preference
1.04
preferring
1.04
prefer
1.00
prefers
0.92
Prefer
0.91
Preference
0.90
preferred
0.83
Advantages
0.77
prefer
0.76
preference
0.74
Activations Density 0.731%