INDEX
Explanations
instances of comparison and contrast
New Auto-Interp
Negative Logits
reed
-0.17
fon
-0.15
Beit
-0.14
-gnu
-0.14
andas
-0.14
èĢ
-0.14
Worlds
-0.13
jie
-0.13
ree
-0.13
_RO
-0.13
POSITIVE LOGITS
stead
0.34
stead
0.31
_inst
0.30
inst
0.29
ather
0.29
-inst
0.29
opposed
0.28
ÃŃsto
0.27
Inst
0.24
ATHER
0.24
Activations Density 0.099%