INDEX
Explanations
phrases describing conditions or locations
New Auto-Interp
Negative Logits
inium
-0.18
voy
-0.17
relation
-0.17
relation
-0.15
combination
-0.15
.connection
-0.15
sla
-0.15
Relation
-0.14
addition
-0.14
ston
-0.14
POSITIVE LOGITS
front
0.41
front
0.35
-front
0.31
FRONT
0.31
_front
0.26
Front
0.26
Front
0.25
fronts
0.24
fron
0.23
frente
0.21
Activations Density 0.247%