INDEX
Explanations
phrases or questions that inquire about the functioning or effectiveness of something
New Auto-Interp
Negative Logits
663
-0.19
664
-0.16
lit
-0.16
oland
-0.15
erais
-0.15
uya
-0.15
lif
-0.15
603
-0.15
ort
-0.15
irut
-0.14
POSITIVE LOGITS
rious
0.15
ATUS
0.15
pcl
0.15
iÄįe
0.15
Å
0.15
νει
0.14
.:
0.14
Bour
0.14
-Smith
0.14
дÑĸл
0.13
Activations Density 0.021%