INDEX
Explanations
phrases indicating contrast or exceptions in discussions
New Auto-Interp
Negative Logits
behalf
-0.16
abyrin
-0.13
ãģĦãĤĭ
-0.13
up
-0.13
BF
-0.13
pož
-0.13
Gord
-0.13
ạp
-0.13
ãģĭãĤĬ
-0.13
zone
-0.13
POSITIVE LOGITS
aside
0.23
aside
0.22
Aside
0.21
Apart
0.19
apart
0.19
Apart
0.18
Aside
0.17
ought
0.17
jÅ¡ÃŃ
0.17
rics
0.15
Activations Density 0.020%