INDEX
Explanations
references to different types of flat surfaces or flat objects
New Auto-Interp
Negative Logits
him
-0.17
holes
-0.17
iest
-0.17
ahr
-0.15
iams
-0.15
بزر
-0.15
aan
-0.15
hole
-0.15
RIX
-0.15
iya
-0.15
POSITIVE LOGITS
ulence
0.40
ulent
0.35
iron
0.28
bed
0.25
ness
0.25
ting
0.25
foot
0.24
lining
0.23
buffers
0.23
bread
0.23
Activations Density 0.008%