INDEX
Explanations
occurrences of the word "and" as well as related references to relationships and coherence in arguments
New Auto-Interp
Negative Logits
reon
-0.17
Ung
-0.17
廳
-0.16
enor
-0.14
åİħ
-0.14
inan
-0.13
iban
-0.13
andre
-0.13
rosso
-0.13
Sab
-0.13
POSITIVE LOGITS
rather
0.52
rather
0.46
not
0.42
chứ
0.42
NOT
0.40
Rather
0.38
ãģ§ãģ¯ãģªãģı
0.38
deÄŁil
0.36
Rather
0.35
ä¸įæĺ¯
0.33
Activations Density 0.335%