INDEX
Explanations
terms related to abstract and concrete concepts in various contexts
New Auto-Interp
Negative Logits
unta
-0.17
acha
-0.16
upe
-0.15
istrovstvÃŃ
-0.15
ÑĥÑĩа
-0.15
rag
-0.15
è¾°
-0.14
olla
-0.14
ot
-0.14
orus
-0.14
POSITIVE LOGITS
ed
0.23
ivism
0.18
ified
0.18
itious
0.18
edly
0.17
edImage
0.17
angelo
0.16
iment
0.15
iction
0.15
jungle
0.15
Activations Density 0.016%