INDEX
Explanations
references to metaphors and figurative language
New Auto-Interp
Negative Logits
AccessException
-0.17
iosity
-0.16
tlement
-0.16
uled
-0.15
wich
-0.15
εÏĦ
-0.15
Disappear
-0.15
ÏĦιν
-0.15
меÑĤÑĮ
-0.15
rada
-0.15
POSITIVE LOGITS
analogy
0.16
celik
0.15
Woodward
0.15
ynth
0.14
.INSTANCE
0.14
astos
0.14
Guth
0.14
575
0.14
Tal
0.13
anvas
0.13
Activations Density 0.035%