INDEX
Explanations
prepositions indicating relationships or connections between entities
New Auto-Interp
Negative Logits
s
-0.40
ÏĤ
-0.21
Ùĩ
-0.20
sÃŃ
-0.19
sburg
-0.19
slope
-0.19
ska
-0.18
sı
-0.17
sar
-0.17
sik
-0.16
POSITIVE LOGITS
ingle
0.15
andal
0.15
andex
0.14
rone
0.14
ĭ
0.14
servername
0.14
oll
0.14
ULSE
0.13
oad
0.13
atch
0.13
Activations Density 0.048%