INDEX
Explanations
patterns and repetitions in descriptive phrases
New Auto-Interp
Negative Logits
obi
-0.16
egov
-0.15
çĽijåIJ¬é¡µéĿ¢
-0.15
prime
-0.15
uy
-0.15
andas
-0.15
263
-0.15
entes
-0.14
ysz
-0.14
itous
-0.14
POSITIVE LOGITS
ruž
0.18
umann
0.16
ver
0.16
æºIJ
0.16
source
0.15
Source
0.15
ños
0.14
same
0.14
awei
0.14
same
0.14
Activations Density 0.100%